[cgroup/for-6.20 PATCH 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()

Waiman Long posted 4 patches 1 month, 2 weeks ago
There is a newer version of this series
[cgroup/for-6.20 PATCH 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
Posted by Waiman Long 1 month, 2 weeks ago
Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
& make update_cpumasks_hier() handle remote partition"), the
compute_effective_exclusive_cpumask() helper was extended to
strip exclusive CPUs from siblings when computing effective_xcpus
(cpuset.cpus.exclusive.effective). This helper was later renamed to
compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
CPU mask computation logic").

This helper is supposed to be used consistently to compute
effective_xcpus. However, there is an exception within the callback
critical section in update_cpumasks_hier() when exclusive_cpus of a
valid partition root is empty. This can cause effective_xcpus value to
differ depending on where exactly it is last computed. Fix this by using
compute_excpus() in this case to give a consistent result.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 3d2d28f0fd03..850334dbc36a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2050,6 +2050,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
 		struct cpuset *parent = parent_cs(cp);
 		bool remote = is_remote_partition(cp);
 		bool update_parent = false;
+		bool empty_xcpus;
 
 		old_prs = new_prs = cp->partition_root_state;
 
@@ -2160,20 +2161,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
 			new_prs = cp->partition_root_state;
 		}
 
+		empty_xcpus = cpumask_empty(cp->exclusive_cpus);
 		spin_lock_irq(&callback_lock);
 		cpumask_copy(cp->effective_cpus, tmp->new_cpus);
 		cp->partition_root_state = new_prs;
-		if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
+		if (((new_prs > 0) && empty_xcpus) ||
+		    ((cp != cs) && !empty_xcpus))
 			compute_excpus(cp, cp->effective_xcpus);
-
-		/*
-		 * Make sure effective_xcpus is properly set for a valid
-		 * partition root.
-		 */
-		if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
-			cpumask_and(cp->effective_xcpus,
-				    cp->cpus_allowed, parent->effective_xcpus);
-		else if (new_prs < 0)
+		if (new_prs < 0)
 			reset_partition_data(cp);
 		spin_unlock_irq(&callback_lock);
 
-- 
2.52.0
Re: [cgroup/for-6.20 PATCH 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
Posted by Chen Ridong 1 month, 1 week ago

On 2025/12/25 15:30, Waiman Long wrote:
> Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check()
> & make update_cpumasks_hier() handle remote partition"), the
> compute_effective_exclusive_cpumask() helper was extended to
> strip exclusive CPUs from siblings when computing effective_xcpus
> (cpuset.cpus.exclusive.effective). This helper was later renamed to
> compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive
> CPU mask computation logic").
> 
> This helper is supposed to be used consistently to compute
> effective_xcpus. However, there is an exception within the callback
> critical section in update_cpumasks_hier() when exclusive_cpus of a
> valid partition root is empty. This can cause effective_xcpus value to
> differ depending on where exactly it is last computed. Fix this by using
> compute_excpus() in this case to give a consistent result.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 3d2d28f0fd03..850334dbc36a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2050,6 +2050,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>  		struct cpuset *parent = parent_cs(cp);
>  		bool remote = is_remote_partition(cp);
>  		bool update_parent = false;
> +		bool empty_xcpus;
>  
>  		old_prs = new_prs = cp->partition_root_state;
>  
> @@ -2160,20 +2161,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>  			new_prs = cp->partition_root_state;
>  		}
>  
> +		empty_xcpus = cpumask_empty(cp->exclusive_cpus);
>  		spin_lock_irq(&callback_lock);
>  		cpumask_copy(cp->effective_cpus, tmp->new_cpus);
>  		cp->partition_root_state = new_prs;
> -		if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
> +		if (((new_prs > 0) && empty_xcpus) ||
> +		    ((cp != cs) && !empty_xcpus))
>  			compute_excpus(cp, cp->effective_xcpus);

The current logic for determining when to recompute effective_xcpus is difficult to follow.
Can we simplify it as follows?

	if(new_prs > 0)
		compute_excpus(cp, cp->effective_xcpus);
	else
		reset_partition_data(cp);

This would make the intent clearer: if cp is a valid partition, we recompute its effective_xcpus;
otherwise, we reset the partition data.

> -
> -		/*
> -		 * Make sure effective_xcpus is properly set for a valid
> -		 * partition root.
> -		 */
> -		if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus))
> -			cpumask_and(cp->effective_xcpus,
> -				    cp->cpus_allowed, parent->effective_xcpus);
> -		else if (new_prs < 0)
> +		if (new_prs < 0)
>  			reset_partition_data(cp);
>  		spin_unlock_irq(&callback_lock);
>  

-- 
Best regards,
Ridong