[PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task

Will Deacon posted 2 patches 2 years, 7 months ago
[PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
Posted by Will Deacon 2 years, 7 months ago
set_cpus_allowed_ptr() will fail with -EINVAL if the requested
affinity mask is not a subset of the task_cpu_possible_mask() for the
task being updated. Consequently, on a heterogeneous system with cpusets
spanning the different CPU types, updates to the cgroup hierarchy can
silently fail to update task affinities when the effective affinity
mask for the cpuset is expanded.

For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
the only cores capable of executing 32-bit tasks. Attaching a 32-bit
task to a cpuset containing CPUs 0-2 will correctly affine the task to
CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
the affinity mask of the 32-bit task because update_tasks_cpumask() will
pass the full 0-3 mask to set_cpus_allowed_ptr().

Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
and use it to mask the 'effective_cpus' mask with the possible mask for
each task being updated.

Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Signed-off-by: Will Deacon <will@kernel.org>
---

Note: We wondered whether it was worth calling guarantee_online_cpus()
if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
this path is only called when the effective mask changes, it didn't
seem appropriate. Ultimately, if you have 32-bit tasks attached to a
cpuset containing only 64-bit cpus, then the affinity is going to be
forced.

 kernel/cgroup/cpuset.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 8552cc2c586a..f15fb0426707 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
 /**
  * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
  * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
+ * @new_cpus: the temp variable for the new effective_cpus mask
  *
  * Iterate through each task of @cs updating its cpus_allowed to the
  * effective cpuset's.  As this function is called with cpuset_rwsem held,
  * cpuset membership stays stable.
  */
-static void update_tasks_cpumask(struct cpuset *cs)
+static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 {
 	struct css_task_iter it;
 	struct task_struct *task;
@@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
 		if (top_cs && (task->flags & PF_KTHREAD) &&
 		    kthread_is_per_cpu(task))
 			continue;
-		set_cpus_allowed_ptr(task, cs->effective_cpus);
+
+		cpumask_and(new_cpus, cs->effective_cpus,
+			    task_cpu_possible_mask(task));
+		set_cpus_allowed_ptr(task, new_cpus);
 	}
 	css_task_iter_end(&it);
 }
@@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
 	spin_unlock_irq(&callback_lock);
 
 	if (adding || deleting)
-		update_tasks_cpumask(parent);
+		update_tasks_cpumask(parent, tmp->new_cpus);
 
 	/*
 	 * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
@@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
 		WARN_ON(!is_in_v2_mode() &&
 			!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
 
-		update_tasks_cpumask(cp);
+		update_tasks_cpumask(cp, tmp->new_cpus);
 
 		/*
 		 * On legacy hierarchy, if the effective cpumask of any non-
@@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 		}
 	}
 
-	update_tasks_cpumask(parent);
+	update_tasks_cpumask(parent, tmpmask.new_cpus);
 
 	if (parent->child_ecpus_count)
 		update_sibling_cpumasks(parent, cs, &tmpmask);
@@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
 	 * as the tasks will be migrated to an ancestor.
 	 */
 	if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
-		update_tasks_cpumask(cs);
+		update_tasks_cpumask(cs, new_cpus);
 	if (mems_updated && !nodes_empty(cs->mems_allowed))
 		update_tasks_nodemask(cs);
 
@@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
 	spin_unlock_irq(&callback_lock);
 
 	if (cpus_updated)
-		update_tasks_cpumask(cs);
+		update_tasks_cpumask(cs, new_cpus);
 	if (mems_updated)
 		update_tasks_nodemask(cs);
 }
-- 
2.39.1.456.gfc5497dd1b-goog
Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
Posted by Tejun Heo 2 years, 7 months ago
On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
> 
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
> 
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
> 
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>

Applied to cgroup/for-6.2-fixes.

Thanks.

-- 
tejun
Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
Posted by Waiman Long 2 years, 7 months ago
On 1/31/23 17:17, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.
>
>   kernel/cgroup/cpuset.c | 18 +++++++++++-------
>   1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 8552cc2c586a..f15fb0426707 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
>   /**
>    * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
>    * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
> + * @new_cpus: the temp variable for the new effective_cpus mask
>    *
>    * Iterate through each task of @cs updating its cpus_allowed to the
>    * effective cpuset's.  As this function is called with cpuset_rwsem held,
>    * cpuset membership stays stable.
>    */
> -static void update_tasks_cpumask(struct cpuset *cs)
> +static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>   {
>   	struct css_task_iter it;
>   	struct task_struct *task;
> @@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
>   		if (top_cs && (task->flags & PF_KTHREAD) &&
>   		    kthread_is_per_cpu(task))
>   			continue;
> -		set_cpus_allowed_ptr(task, cs->effective_cpus);
> +
> +		cpumask_and(new_cpus, cs->effective_cpus,
> +			    task_cpu_possible_mask(task));
> +		set_cpus_allowed_ptr(task, new_cpus);
>   	}
>   	css_task_iter_end(&it);
>   }
> @@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
>   	spin_unlock_irq(&callback_lock);
>   
>   	if (adding || deleting)
> -		update_tasks_cpumask(parent);
> +		update_tasks_cpumask(parent, tmp->new_cpus);
>   
>   	/*
>   	 * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
> @@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>   		WARN_ON(!is_in_v2_mode() &&
>   			!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
>   
> -		update_tasks_cpumask(cp);
> +		update_tasks_cpumask(cp, tmp->new_cpus);
>   
>   		/*
>   		 * On legacy hierarchy, if the effective cpumask of any non-
> @@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>   		}
>   	}
>   
> -	update_tasks_cpumask(parent);
> +	update_tasks_cpumask(parent, tmpmask.new_cpus);
>   
>   	if (parent->child_ecpus_count)
>   		update_sibling_cpumasks(parent, cs, &tmpmask);
> @@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
>   	 * as the tasks will be migrated to an ancestor.
>   	 */
>   	if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
> -		update_tasks_cpumask(cs);
> +		update_tasks_cpumask(cs, new_cpus);
>   	if (mems_updated && !nodes_empty(cs->mems_allowed))
>   		update_tasks_nodemask(cs);
>   
> @@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
>   	spin_unlock_irq(&callback_lock);
>   
>   	if (cpus_updated)
> -		update_tasks_cpumask(cs);
> +		update_tasks_cpumask(cs, new_cpus);
>   	if (mems_updated)
>   		update_tasks_nodemask(cs);
>   }

Acked-by: Waiman Long <longman@redhat.com>

This change is good for backporting to stable releases. For the latest 
kernel, I will prefer to centralize the masking in 
__set_cpus_allowed_ptr() where a scratch_mask is already being used for 
masking purpose. Let get this patch merged now, I will send a patch to 
move off the masking afterward.

Cheers,
Longman
Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
Posted by Peter Zijlstra 2 years, 7 months ago
On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
> 
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
> 
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
> 
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
> 
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.

Right, so the case above where the cpuset is shrunk to 0-1, then the
intersection between the cpuset and task_cpu_possible_mask() is empty
and it currently simply fails to update mask.

I argued it was probably desired to walk up the tree to find a wider
parent until the intersection of {cpuset, task_cpu_possible_mask,
online} becomes non-empty.

But I suppose that can wait until we have more time.
Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
Posted by Waiman Long 2 years, 7 months ago
On 1/31/23 17:17, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.

Now I see how the sched_setaffinity() change is impacting arm64. Instead 
of putting in the bandage in cpuset. I would suggest doing another cpu 
masking in __set_cpus_allowed_ptr() similar to what is now done for 
user_cpus_ptr.

Another suggestion that I have is to add a helper like 
has_task_cpu_possible_mask() that returns a true/false value. In this 
way, we only need to do an additional masking if we have some mismatched 
32-bit only cpus available in the system running 32-bit binaries so that 
we can skip this step in most cases compiling them out in non-arm64 systems.

By doing that, we may be able to remove some of the existing usages of 
task_cpu_possible_mask().

Thought?

Cheers,
Longman
Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
Posted by Peter Zijlstra 2 years, 7 months ago
On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> > affinity mask is not a subset of the task_cpu_possible_mask() for the
> > task being updated. Consequently, on a heterogeneous system with cpusets
> > spanning the different CPU types, updates to the cgroup hierarchy can
> > silently fail to update task affinities when the effective affinity
> > mask for the cpuset is expanded.
> > 
> > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> > the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> > task to a cpuset containing CPUs 0-2 will correctly affine the task to
> > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> > the affinity mask of the 32-bit task because update_tasks_cpumask() will
> > pass the full 0-3 mask to set_cpus_allowed_ptr().
> > 
> > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> > and use it to mask the 'effective_cpus' mask with the possible mask for
> > each task being updated.
> > 
> > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> > 
> > Note: We wondered whether it was worth calling guarantee_online_cpus()
> > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> > this path is only called when the effective mask changes, it didn't
> > seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> > cpuset containing only 64-bit cpus, then the affinity is going to be
> > forced.
> 
> Now I see how the sched_setaffinity() change is impacting arm64. Instead of
> putting in the bandage in cpuset. I would suggest doing another cpu masking
> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr.

NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed.

Masking the offline CPUs is *WRONG*.
Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
Posted by Waiman Long 2 years, 7 months ago
On 2/1/23 04:15, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote:
>> On 1/31/23 17:17, Will Deacon wrote:
>>> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
>>> affinity mask is not a subset of the task_cpu_possible_mask() for the
>>> task being updated. Consequently, on a heterogeneous system with cpusets
>>> spanning the different CPU types, updates to the cgroup hierarchy can
>>> silently fail to update task affinities when the effective affinity
>>> mask for the cpuset is expanded.
>>>
>>> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
>>> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
>>> task to a cpuset containing CPUs 0-2 will correctly affine the task to
>>> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
>>> the affinity mask of the 32-bit task because update_tasks_cpumask() will
>>> pass the full 0-3 mask to set_cpus_allowed_ptr().
>>>
>>> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
>>> and use it to mask the 'effective_cpus' mask with the possible mask for
>>> each task being updated.
>>>
>>> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>
>>> Note: We wondered whether it was worth calling guarantee_online_cpus()
>>> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
>>> this path is only called when the effective mask changes, it didn't
>>> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
>>> cpuset containing only 64-bit cpus, then the affinity is going to be
>>> forced.
>> Now I see how the sched_setaffinity() change is impacting arm64. Instead of
>> putting in the bandage in cpuset. I would suggest doing another cpu masking
>> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr.
> NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed.
>
> Masking the offline CPUs is *WRONG*.
>
This patch is not related to offline cpus at all. It is all about the 
32-bit misfit cpus in some arm64 system.

Cheers,
Longman