[cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping

Waiman Long posted 3 patches 3 months, 1 week ago
There is a newer version of this series
[cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Waiman Long 3 months, 1 week ago
From: Gabriele Monaco <gmonaco@redhat.com>

Currently the user can set up isolated cpus via cpuset and nohz_full in
such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
domain isolated nor nohz full). This can be a problem for other
subsystems (e.g. the timer wheel imgration).

Prevent this configuration by blocking any assignation that would cause
the union of domain isolated cpus and nohz_full to covers all CPUs.

Acked-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index da770dac955e..d6d459c95d82 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1329,6 +1329,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
 		cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
 }
 
+/*
+ * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
+ * @prs: new or old partition_root_state
+ * @parent: parent cpuset
+ * Return: true if isolated_cpus needs modification, false otherwise
+ */
+static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
+{
+	if (!parent)
+		parent = &top_cpuset;
+	return prs != parent->partition_root_state;
+}
+
 /*
  * partition_xcpus_add - Add new exclusive CPUs to partition
  * @new_prs: new partition_root_state
@@ -1393,6 +1406,42 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
 	return isolcpus_updated;
 }
 
+/*
+ * isolated_cpus_can_update - check for isolated & nohz_full conflicts
+ * @add_cpus: cpu mask for cpus that are going to be isolated
+ * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
+ * Return: false if there is conflict, true otherwise
+ *
+ * If nohz_full is enabled and we have isolated CPUs, their combination must
+ * still leave housekeeping CPUs.
+ */
+static bool isolated_cpus_can_update(struct cpumask *add_cpus,
+				     struct cpumask *del_cpus)
+{
+	cpumask_var_t full_hk_cpus;
+	int res = true;
+
+	if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
+		return true;
+
+	if (del_cpus && cpumask_weight_and(del_cpus,
+			housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
+		return true;
+
+	if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
+		return false;
+
+	cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
+		    housekeeping_cpumask(HK_TYPE_DOMAIN));
+	cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
+	cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
+	if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
+		res = false;
+
+	free_cpumask_var(full_hk_cpus);
+	return res;
+}
+
 static void update_isolation_cpumasks(bool isolcpus_updated)
 {
 	int ret;
@@ -1551,6 +1600,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 	if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
 	    cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
 		return PERR_INVCPUS;
+	if (isolated_cpus_should_update(new_prs, NULL) &&
+	    !isolated_cpus_can_update(tmp->new_cpus, NULL))
+		return PERR_HKEEPING;
 
 	spin_lock_irq(&callback_lock);
 	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
@@ -1650,6 +1702,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 		else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
 			 cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
 			cs->prs_err = PERR_NOCPUS;
+		else if (isolated_cpus_should_update(prs, NULL) &&
+			 !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
+			cs->prs_err = PERR_HKEEPING;
 		if (cs->prs_err)
 			goto invalidate;
 	}
@@ -1988,6 +2043,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 			return err;
 	}
 
+	if (deleting && isolated_cpus_should_update(new_prs, parent) &&
+	    !isolated_cpus_can_update(tmp->delmask, tmp->addmask)) {
+		cs->prs_err = PERR_HKEEPING;
+		return PERR_HKEEPING;
+	}
+
 	/*
 	 * Change the parent's effective_cpus & effective_xcpus (top cpuset
 	 * only).
@@ -2994,7 +3055,11 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 		 * A change in load balance state only, no change in cpumasks.
 		 * Need to update isolated_cpus.
 		 */
-		isolcpus_updated = true;
+		if ((new_prs == PRS_ISOLATED) &&
+		    !isolated_cpus_can_update(cs->effective_xcpus, NULL))
+			err = PERR_HKEEPING;
+		else
+			isolcpus_updated = true;
 	} else {
 		/*
 		 * Switching back to member is always allowed even if it
-- 
2.51.1
Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Chen Ridong 3 months, 1 week ago


> +
>  	/*
>  	 * Change the parent's effective_cpus & effective_xcpus (top cpuset
>  	 * only).
> @@ -2994,7 +3055,11 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>  		 * A change in load balance state only, no change in cpumasks.
>  		 * Need to update isolated_cpus.
>  		 */
> -		isolcpus_updated = true;
> +		if ((new_prs == PRS_ISOLATED) &&
> +		    !isolated_cpus_can_update(cs->effective_xcpus, NULL))
> +			err = PERR_HKEEPING;
> +		else
> +			isolcpus_updated = true;
>  	} else {
>  		/*
>  		 * Switching back to member is always allowed even if it

This is an issue I am trying to fix, the prstate_housekeeping_conflict check is necessary.

https://lore.kernel.org/cgroups/20251025064844.495525-2-chenridong@huaweicloud.com/

-- 
Best regards,
Ridong
Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Waiman Long 3 months, 1 week ago
On 11/2/25 10:53 PM, Chen Ridong wrote:
>
>
>> +
>>   	/*
>>   	 * Change the parent's effective_cpus & effective_xcpus (top cpuset
>>   	 * only).
>> @@ -2994,7 +3055,11 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>>   		 * A change in load balance state only, no change in cpumasks.
>>   		 * Need to update isolated_cpus.
>>   		 */
>> -		isolcpus_updated = true;
>> +		if ((new_prs == PRS_ISOLATED) &&
>> +		    !isolated_cpus_can_update(cs->effective_xcpus, NULL))
>> +			err = PERR_HKEEPING;
>> +		else
>> +			isolcpus_updated = true;
>>   	} else {
>>   		/*
>>   		 * Switching back to member is always allowed even if it
> This is an issue I am trying to fix, the prstate_housekeeping_conflict check is necessary.
>
> https://lore.kernel.org/cgroups/20251025064844.495525-2-chenridong@huaweicloud.com/
>
You are right. We should add prstate_housekeeping_conflict() check here.

BTW, I found some issues when I looked further at the patch. So I am 
going to rewrite part of it and will send out a v2 after some testing.

Cheers,
Longman
Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Chen Ridong 3 months, 1 week ago

On 2025/11/3 9:34, Waiman Long wrote:
> From: Gabriele Monaco <gmonaco@redhat.com>
> 
> Currently the user can set up isolated cpus via cpuset and nohz_full in
> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
> domain isolated nor nohz full). This can be a problem for other
> subsystems (e.g. the timer wheel imgration).
> 
> Prevent this configuration by blocking any assignation that would cause
> the union of domain isolated cpus and nohz_full to covers all CPUs.
> 
> Acked-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index da770dac955e..d6d459c95d82 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1329,6 +1329,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
>  		cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
>  }
>  
> +/*
> + * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
> + * @prs: new or old partition_root_state
> + * @parent: parent cpuset
> + * Return: true if isolated_cpus needs modification, false otherwise
> + */
> +static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
> +{
> +	if (!parent)
> +		parent = &top_cpuset;
> +	return prs != parent->partition_root_state;
> +}
> +
>  /*
>   * partition_xcpus_add - Add new exclusive CPUs to partition
>   * @new_prs: new partition_root_state
> @@ -1393,6 +1406,42 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>  	return isolcpus_updated;
>  }
>  
> +/*
> + * isolated_cpus_can_update - check for isolated & nohz_full conflicts
> + * @add_cpus: cpu mask for cpus that are going to be isolated
> + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
> + * Return: false if there is conflict, true otherwise
> + *
> + * If nohz_full is enabled and we have isolated CPUs, their combination must
> + * still leave housekeeping CPUs.
> + */
> +static bool isolated_cpus_can_update(struct cpumask *add_cpus,
> +				     struct cpumask *del_cpus)
> +{
> +	cpumask_var_t full_hk_cpus;
> +	int res = true;
> +
> +	if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
> +		return true;
> +
> +	if (del_cpus && cpumask_weight_and(del_cpus,
> +			housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
> +		return true;
> +
> +	if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
> +		return false;
> +
> +	cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
> +		    housekeeping_cpumask(HK_TYPE_DOMAIN));
> +	cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
> +	cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
> +	if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
> +		res = false;
> +
> +	free_cpumask_var(full_hk_cpus);
> +	return res;
> +}
> +
>  static void update_isolation_cpumasks(bool isolcpus_updated)
>  {
>  	int ret;
> @@ -1551,6 +1600,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>  	if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
>  	    cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
>  		return PERR_INVCPUS;
> +	if (isolated_cpus_should_update(new_prs, NULL) &&
> +	    !isolated_cpus_can_update(tmp->new_cpus, NULL))
> +		return PERR_HKEEPING;
>  
>  	spin_lock_irq(&callback_lock);
>  	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
> @@ -1650,6 +1702,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>  		else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
>  			 cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
>  			cs->prs_err = PERR_NOCPUS;
> +		else if (isolated_cpus_should_update(prs, NULL) &&
> +			 !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
> +			cs->prs_err = PERR_HKEEPING;
>  		if (cs->prs_err)
>  			goto invalidate;
>  	}
> @@ -1988,6 +2043,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>  			return err;
>  	}
>  
> +	if (deleting && isolated_cpus_should_update(new_prs, parent) &&
> +	    !isolated_cpus_can_update(tmp->delmask, tmp->addmask)) {
> +		cs->prs_err = PERR_HKEEPING;
> +		return PERR_HKEEPING;
> +	}
> +
>  	/*
>  	 * Change the parent's effective_cpus & effective_xcpus (top cpuset
>  	 * only).
> @@ -2994,7 +3055,11 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>  		 * A change in load balance state only, no change in cpumasks.
>  		 * Need to update isolated_cpus.
>  		 */
> -		isolcpus_updated = true;
> +		if ((new_prs == PRS_ISOLATED) &&
> +		    !isolated_cpus_can_update(cs->effective_xcpus, NULL))
> +			err = PERR_HKEEPING;
> +		else
> +			isolcpus_updated = true;
>  	} else {
>  		/*
>  		 * Switching back to member is always allowed even if it

I'm considering whether I should introduce a new function that consolidates
isolated_cpus_should_update, isolated_cpus_can_update, and prstate_housekeeping_conflict.

Just like:

bool housekeeping_conflict(...)
{
	if (isolated_cpus_should_update && !isolated_cpus_can_update) {
		return ture;
	}
	return prstate_housekeeping_conflict();
}

Since all of these are related to isolated CPUs, putting them into a centralized function would make
the code much easier to maintain.

-- 
Best regards,
Ridong
Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Chen Ridong 3 months, 1 week ago

On 2025/11/3 11:47, Chen Ridong wrote:
> 
> 
> On 2025/11/3 9:34, Waiman Long wrote:
>> From: Gabriele Monaco <gmonaco@redhat.com>
>>
>> Currently the user can set up isolated cpus via cpuset and nohz_full in
>> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
>> domain isolated nor nohz full). This can be a problem for other
>> subsystems (e.g. the timer wheel imgration).
>>
>> Prevent this configuration by blocking any assignation that would cause
>> the union of domain isolated cpus and nohz_full to covers all CPUs.
>>
>> Acked-by: Frederic Weisbecker <frederic@kernel.org>
>> Reviewed-by: Waiman Long <longman@redhat.com>
>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  kernel/cgroup/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 66 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index da770dac955e..d6d459c95d82 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1329,6 +1329,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
>>  		cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
>>  }
>>  
>> +/*
>> + * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
>> + * @prs: new or old partition_root_state
>> + * @parent: parent cpuset
>> + * Return: true if isolated_cpus needs modification, false otherwise
>> + */
>> +static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
>> +{
>> +	if (!parent)
>> +		parent = &top_cpuset;
>> +	return prs != parent->partition_root_state;
>> +}
>> +
>>  /*
>>   * partition_xcpus_add - Add new exclusive CPUs to partition
>>   * @new_prs: new partition_root_state
>> @@ -1393,6 +1406,42 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>>  	return isolcpus_updated;
>>  }
>>  
>> +/*
>> + * isolated_cpus_can_update - check for isolated & nohz_full conflicts
>> + * @add_cpus: cpu mask for cpus that are going to be isolated
>> + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
>> + * Return: false if there is conflict, true otherwise
>> + *
>> + * If nohz_full is enabled and we have isolated CPUs, their combination must
>> + * still leave housekeeping CPUs.
>> + */
>> +static bool isolated_cpus_can_update(struct cpumask *add_cpus,
>> +				     struct cpumask *del_cpus)
>> +{
>> +	cpumask_var_t full_hk_cpus;
>> +	int res = true;
>> +
>> +	if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
>> +		return true;
>> +
>> +	if (del_cpus && cpumask_weight_and(del_cpus,
>> +			housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
>> +		return true;
>> +
>> +	if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
>> +		return false;
>> +
>> +	cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
>> +		    housekeeping_cpumask(HK_TYPE_DOMAIN));
>> +	cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
>> +	cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
>> +	if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
>> +		res = false;
>> +
>> +	free_cpumask_var(full_hk_cpus);
>> +	return res;
>> +}
>> +
>>  static void update_isolation_cpumasks(bool isolcpus_updated)
>>  {
>>  	int ret;
>> @@ -1551,6 +1600,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>>  	if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
>>  	    cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
>>  		return PERR_INVCPUS;
>> +	if (isolated_cpus_should_update(new_prs, NULL) &&
>> +	    !isolated_cpus_can_update(tmp->new_cpus, NULL))
>> +		return PERR_HKEEPING;
>>  
>>  	spin_lock_irq(&callback_lock);
>>  	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
>> @@ -1650,6 +1702,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>>  		else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
>>  			 cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
>>  			cs->prs_err = PERR_NOCPUS;
>> +		else if (isolated_cpus_should_update(prs, NULL) &&
>> +			 !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
>> +			cs->prs_err = PERR_HKEEPING;
>>  		if (cs->prs_err)
>>  			goto invalidate;
>>  	}
>> @@ -1988,6 +2043,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>>  			return err;
>>  	}
>>  
>> +	if (deleting && isolated_cpus_should_update(new_prs, parent) &&
>> +	    !isolated_cpus_can_update(tmp->delmask, tmp->addmask)) {
>> +		cs->prs_err = PERR_HKEEPING;
>> +		return PERR_HKEEPING;
>> +	}
>> +
>>  	/*
>>  	 * Change the parent's effective_cpus & effective_xcpus (top cpuset
>>  	 * only).
>> @@ -2994,7 +3055,11 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>>  		 * A change in load balance state only, no change in cpumasks.
>>  		 * Need to update isolated_cpus.
>>  		 */
>> -		isolcpus_updated = true;
>> +		if ((new_prs == PRS_ISOLATED) &&
>> +		    !isolated_cpus_can_update(cs->effective_xcpus, NULL))
>> +			err = PERR_HKEEPING;
>> +		else
>> +			isolcpus_updated = true;
>>  	} else {
>>  		/*
>>  		 * Switching back to member is always allowed even if it
> 
> I'm considering whether I should introduce a new function that consolidates
> isolated_cpus_should_update, isolated_cpus_can_update, and prstate_housekeeping_conflict.
> 

Sorry, we should introduce a new ...

> Just like:
> 
> bool housekeeping_conflict(...)
> {
> 	if (isolated_cpus_should_update && !isolated_cpus_can_update) {
> 		return ture;
> 	}
> 	return prstate_housekeeping_conflict();
> }
> 
> Since all of these are related to isolated CPUs, putting them into a centralized function would make
> the code much easier to maintain.
> 

-- 
Best regards,
Ridong
Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Chen Ridong 3 months, 1 week ago

On 2025/11/3 9:34, Waiman Long wrote:
> From: Gabriele Monaco <gmonaco@redhat.com>
> 
> Currently the user can set up isolated cpus via cpuset and nohz_full in
> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
> domain isolated nor nohz full). This can be a problem for other
> subsystems (e.g. the timer wheel imgration).
> 
> Prevent this configuration by blocking any assignation that would cause
> the union of domain isolated cpus and nohz_full to covers all CPUs.
> 
> Acked-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index da770dac955e..d6d459c95d82 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1329,6 +1329,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
>  		cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
>  }
>  
> +/*
> + * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
> + * @prs: new or old partition_root_state
> + * @parent: parent cpuset
> + * Return: true if isolated_cpus needs modification, false otherwise
> + */
> +static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
> +{
> +	if (!parent)
> +		parent = &top_cpuset;
> +	return prs != parent->partition_root_state;
> +}
> +

Hi Longman,

I am confused about this function.

Why do we need to compare the partition_root_state (prs) with the parent's partition_root_state?

For example, when a local partition is assigned to a member, I don't think the isolated cpumasks
should be updated in this case.

In my understanding, the isolated CPUs should only be updated when an isolated partition is being
disabled or enabled. I was thinking of something like this:

bool isolated_cpus_should_update(int new_prs, int old_prs)
{
    if (new_prs == old_prs)
        return false;
    if (old_prs == 2 || new_prs == 2)
        return true;
    return false;
}

I would really appreciate it if you could provide some further explanation on this.

-- 
Best regards,
Ridong
Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Waiman Long 3 months, 1 week ago
On 11/2/25 9:55 PM, Chen Ridong wrote:
>
> On 2025/11/3 9:34, Waiman Long wrote:
>> From: Gabriele Monaco <gmonaco@redhat.com>
>>
>> Currently the user can set up isolated cpus via cpuset and nohz_full in
>> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
>> domain isolated nor nohz full). This can be a problem for other
>> subsystems (e.g. the timer wheel imgration).
>>
>> Prevent this configuration by blocking any assignation that would cause
>> the union of domain isolated cpus and nohz_full to covers all CPUs.
>>
>> Acked-by: Frederic Weisbecker <frederic@kernel.org>
>> Reviewed-by: Waiman Long <longman@redhat.com>
>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/cgroup/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 66 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index da770dac955e..d6d459c95d82 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1329,6 +1329,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
>>   		cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
>>   }
>>   
>> +/*
>> + * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
>> + * @prs: new or old partition_root_state
>> + * @parent: parent cpuset
>> + * Return: true if isolated_cpus needs modification, false otherwise
>> + */
>> +static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
>> +{
>> +	if (!parent)
>> +		parent = &top_cpuset;
>> +	return prs != parent->partition_root_state;
>> +}
>> +
> Hi Longman,
>
> I am confused about this function.
>
> Why do we need to compare the partition_root_state (prs) with the parent's partition_root_state?
>
> For example, when a local partition is assigned to a member, I don't think the isolated cpumasks
> should be updated in this case.
>
> In my understanding, the isolated CPUs should only be updated when an isolated partition is being
> disabled or enabled. I was thinking of something like this:
>
> bool isolated_cpus_should_update(int new_prs, int old_prs)
> {
>      if (new_prs == old_prs)
>          return false;
>      if (old_prs == 2 || new_prs == 2)
>          return true;
>      return false;
> }
>
> I would really appreciate it if you could provide some further explanation on this.

This function should only be called when both the current and the parent 
(top_cpuset for remote) are valid partition roots. For both local and 
remote partition, the child cpuset takes CPUs from the parent. The list 
of isolated CPUs will only change if parent and child cpusets have 
different partition root types. If parent is an isolated partition, 
taking CPUs from parent to form another isolated partition will not 
change isolated_cpus. The same is true if both parent and child are root.

You are right that this check may not be obvious for a casual observer. 
I can add some more comments to clarify that.

Cheers,
Longman

>
Re: [cgroup/for-6.19 PATCH 2/3] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
Posted by Chen Ridong 3 months, 1 week ago

On 2025/11/3 11:12, Waiman Long wrote:
> On 11/2/25 9:55 PM, Chen Ridong wrote:
>>
>> On 2025/11/3 9:34, Waiman Long wrote:
>>> From: Gabriele Monaco <gmonaco@redhat.com>
>>>
>>> Currently the user can set up isolated cpus via cpuset and nohz_full in
>>> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
>>> domain isolated nor nohz full). This can be a problem for other
>>> subsystems (e.g. the timer wheel imgration).
>>>
>>> Prevent this configuration by blocking any assignation that would cause
>>> the union of domain isolated cpus and nohz_full to covers all CPUs.
>>>
>>> Acked-by: Frederic Weisbecker <frederic@kernel.org>
>>> Reviewed-by: Waiman Long <longman@redhat.com>
>>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/cgroup/cpuset.c | 67 +++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 66 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index da770dac955e..d6d459c95d82 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -1329,6 +1329,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask
>>> *xcpus
>>>           cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
>>>   }
>>>   +/*
>>> + * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
>>> + * @prs: new or old partition_root_state
>>> + * @parent: parent cpuset
>>> + * Return: true if isolated_cpus needs modification, false otherwise
>>> + */
>>> +static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
>>> +{
>>> +    if (!parent)
>>> +        parent = &top_cpuset;
>>> +    return prs != parent->partition_root_state;
>>> +}
>>> +
>> Hi Longman,
>>
>> I am confused about this function.
>>
>> Why do we need to compare the partition_root_state (prs) with the parent's partition_root_state?
>>
>> For example, when a local partition is assigned to a member, I don't think the isolated cpumasks
>> should be updated in this case.
>>
>> In my understanding, the isolated CPUs should only be updated when an isolated partition is being
>> disabled or enabled. I was thinking of something like this:
>>
>> bool isolated_cpus_should_update(int new_prs, int old_prs)
>> {
>>      if (new_prs == old_prs)
>>          return false;
>>      if (old_prs == 2 || new_prs == 2)
>>          return true;
>>      return false;
>> }
>>
>> I would really appreciate it if you could provide some further explanation on this.
> 
> This function should only be called when both the current and the parent (top_cpuset for remote) are
> valid partition roots. For both local and remote partition, the child cpuset takes CPUs from the
> parent. The list of isolated CPUs will only change if parent and child cpusets have different
> partition root types. If parent is an isolated partition, taking CPUs from parent to form another
> isolated partition will not change isolated_cpus. The same is true if both parent and child are root.
> 
> You are right that this check may not be obvious for a casual observer. I can add some more comments
> to clarify that.
> 

Thank you for your explanation.

Yes, this can be a common scenario.

More comments will be helpful.

-- 
Best regards,
Ridong