[v2] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict

[cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Waiman Long 1 month, 1 week ago

Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
until valid partition") introduced a new check to disallow the setting
of a new cpuset.cpus.exclusive value that is a superset of a sibling's
cpuset.cpus value so that there will at least be one CPU left in the
sibling in case the cpuset becomes a valid partition root. This new
check does have the side effect of failing a cpuset.cpus change that
make it a subset of a sibling's cpuset.cpus.exclusive value.

With v2, users are supposed to be allowed to set whatever value they
want in cpuset.cpus without failure. To maintain this rule, the check
is now restricted to only when cpuset.cpus.exclusive is being changed
not when cpuset.cpus is changed.

The cgroup-v2.rst doc file is also updated to reflect this change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  8 +++----
 kernel/cgroup/cpuset.c                  | 30 ++++++++++++-------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 7f5b59d95fce..510df2461aff 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2561,10 +2561,10 @@ Cpuset Interface Files
 	Users can manually set it to a value that is different from
 	"cpuset.cpus".	One constraint in setting it is that the list of
 	CPUs must be exclusive with respect to "cpuset.cpus.exclusive"
-	of its sibling.  If "cpuset.cpus.exclusive" of a sibling cgroup
-	isn't set, its "cpuset.cpus" value, if set, cannot be a subset
-	of it to leave at least one CPU available when the exclusive
-	CPUs are taken away.
+	and "cpuset.cpus.exclusive.effective" of its siblings.	Another
+	constraint is that it cannot be a superset of "cpuset.cpus"
+	of its sibling in order to leave at least one CPU available to
+	that sibling when the exclusive CPUs are taken away.
 
 	For a parent cgroup, any one of its exclusive CPUs can only
 	be distributed to at most one of its child cgroups.  Having an
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 37d118a9ad4d..30e31fac4fe3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
 
 /**
  * cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
- * @cs1: first cpuset to check
- * @cs2: second cpuset to check
+ * @trial:	the trial cpuset to be checked
+ * @sibling:	a sibling cpuset to be checked against
+ * @xcpus_changed: set if exclusive_cpus has been set
  *
  * Returns: true if CPU exclusivity conflict exists, false otherwise
  *
  * Conflict detection rules:
  * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
  * 2. exclusive_cpus masks cannot intersect between cpusets
- * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
+ * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
  */
-static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
+static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
+				      bool xcpus_changed)
 {
 	/* If either cpuset is exclusive, check if they are mutually exclusive */
-	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
-		return !cpusets_are_exclusive(cs1, cs2);
+	if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
+		return !cpusets_are_exclusive(trial, sibling);
 
 	/* Exclusive_cpus cannot intersect */
-	if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
+	if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
 		return true;
 
-	/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
-	if (!cpumask_empty(cs1->cpus_allowed) &&
-	    cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
-		return true;
-
-	if (!cpumask_empty(cs2->cpus_allowed) &&
-	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
+	/* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
+	if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
+	    cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
 		return true;
 
 	return false;
@@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 {
 	struct cgroup_subsys_state *css;
 	struct cpuset *c, *par;
+	bool xcpus_changed;
 	int ret = 0;
 
 	rcu_read_lock();
@@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 	 * overlap. exclusive_cpus cannot overlap with each other if set.
 	 */
 	ret = -EINVAL;
+	xcpus_changed = !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cpus);
 	cpuset_for_each_child(c, css, par) {
 		if (c == cur)
 			continue;
-		if (cpus_excl_conflict(trial, c))
+		if (cpus_excl_conflict(trial, c, xcpus_changed))
 			goto out;
 		if (mems_excl_conflict(trial, c))
 			goto out;
-- 
2.52.0

Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Chen Ridong 1 month ago


On 2026/1/2 3:15, Waiman Long wrote:
> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
> until valid partition") introduced a new check to disallow the setting
> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
> cpuset.cpus value so that there will at least be one CPU left in the
> sibling in case the cpuset becomes a valid partition root. This new
> check does have the side effect of failing a cpuset.cpus change that
> make it a subset of a sibling's cpuset.cpus.exclusive value.
> 
> With v2, users are supposed to be allowed to set whatever value they
> want in cpuset.cpus without failure. To maintain this rule, the check
> is now restricted to only when cpuset.cpus.exclusive is being changed
> not when cpuset.cpus is changed.
> 

Hi, Longman,

You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
explicitly documented. Should we add it?

More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
constraints? This seems to be the underlying assumption in this patch.

On the implementation side, the patch looks good to me.

> The cgroup-v2.rst doc file is also updated to reflect this change.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  8 +++----
>  kernel/cgroup/cpuset.c                  | 30 ++++++++++++-------------
>  2 files changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 7f5b59d95fce..510df2461aff 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2561,10 +2561,10 @@ Cpuset Interface Files
>  	Users can manually set it to a value that is different from
>  	"cpuset.cpus".	One constraint in setting it is that the list of
>  	CPUs must be exclusive with respect to "cpuset.cpus.exclusive"
> -	of its sibling.  If "cpuset.cpus.exclusive" of a sibling cgroup
> -	isn't set, its "cpuset.cpus" value, if set, cannot be a subset
> -	of it to leave at least one CPU available when the exclusive
> -	CPUs are taken away.
> +	and "cpuset.cpus.exclusive.effective" of its siblings.	Another
> +	constraint is that it cannot be a superset of "cpuset.cpus"
> +	of its sibling in order to leave at least one CPU available to
> +	that sibling when the exclusive CPUs are taken away.
>  
>  	For a parent cgroup, any one of its exclusive CPUs can only
>  	be distributed to at most one of its child cgroups.  Having an
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 37d118a9ad4d..30e31fac4fe3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>  
>  /**
>   * cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts
> - * @cs1: first cpuset to check
> - * @cs2: second cpuset to check
> + * @trial:	the trial cpuset to be checked
> + * @sibling:	a sibling cpuset to be checked against
> + * @xcpus_changed: set if exclusive_cpus has been set
>   *
>   * Returns: true if CPU exclusivity conflict exists, false otherwise
>   *
>   * Conflict detection rules:
>   * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
>   * 2. exclusive_cpus masks cannot intersect between cpusets
> - * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
> + * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
>   */
> -static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> +static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
> +				      bool xcpus_changed)
>  {
>  	/* If either cpuset is exclusive, check if they are mutually exclusive */
> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> -		return !cpusets_are_exclusive(cs1, cs2);
> +	if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
> +		return !cpusets_are_exclusive(trial, sibling);
>  
>  	/* Exclusive_cpus cannot intersect */
> -	if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus))
> +	if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
>  		return true;
>  
> -	/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
> -	if (!cpumask_empty(cs1->cpus_allowed) &&
> -	    cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
> -		return true;
> -
> -	if (!cpumask_empty(cs2->cpus_allowed) &&
> -	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
> +	/* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
> +	if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
> +	    cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
>  		return true;
>  
>  	return false;
> @@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
>  {
>  	struct cgroup_subsys_state *css;
>  	struct cpuset *c, *par;
> +	bool xcpus_changed;
>  	int ret = 0;
>  
>  	rcu_read_lock();
> @@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
>  	 * overlap. exclusive_cpus cannot overlap with each other if set.
>  	 */
>  	ret = -EINVAL;
> +	xcpus_changed = !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cpus);
>  	cpuset_for_each_child(c, css, par) {
>  		if (c == cur)
>  			continue;
> -		if (cpus_excl_conflict(trial, c))
> +		if (cpus_excl_conflict(trial, c, xcpus_changed))
>  			goto out;
>  		if (mems_excl_conflict(trial, c))
>  			goto out;

-- 
Best regards,
Ridong

Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Waiman Long 1 month ago

On 1/4/26 2:09 AM, Chen Ridong wrote:
>
> On 2026/1/2 3:15, Waiman Long wrote:
>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>> until valid partition") introduced a new check to disallow the setting
>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>> cpuset.cpus value so that there will at least be one CPU left in the
>> sibling in case the cpuset becomes a valid partition root. This new
>> check does have the side effect of failing a cpuset.cpus change that
>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>
>> With v2, users are supposed to be allowed to set whatever value they
>> want in cpuset.cpus without failure. To maintain this rule, the check
>> is now restricted to only when cpuset.cpus.exclusive is being changed
>> not when cpuset.cpus is changed.
>>
> Hi, Longman,
>
> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
> explicitly documented. Should we add it?
>
> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
> constraints? This seems to be the underlying assumption in this patch.

Before the introduction of cpuset partition, writing to cpuset.cpus will 
only fail if the cpu list is invalid like containing CPUs outside of the 
valid cpu range. What I mean by "never-fail" is that if the cpu list is 
valid, the write action should not fail. The rule is not explicitly 
stated in the documentation, but it is a pre-existing behavior which we 
should try to keep to avoid breaking existing applications.

The exclusive CPU constraint does not apply to cpuset.cpus. It only 
applies when setting cpuset.cpus.exclusive wrt to other 
cpuset.cpus.exclusive* in sibling cpusets. So I will not say one has 
higher priority than the other.

Cheers,
Longman

Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Michal Koutný 1 month ago

On Sun, Jan 04, 2026 at 04:48:06PM -0500, Waiman Long <llong@redhat.com> wrote:
> Before the introduction of cpuset partition, writing to cpuset.cpus will
> only fail if the cpu list is invalid like containing CPUs outside of the
> valid cpu range. What I mean by "never-fail" is that if the cpu list is
> valid, the write action should not fail. The rule is not explicitly stated
> in the documentation, but it is a pre-existing behavior which we should try
> to keep to avoid breaking existing applications.

The justification for such behavior is that when the configuration
cannot be satisfied immediately (insufficient resources in ancestors),
the original user's intention should be stored somewhere and if the
conditions higher up the hiearchy possibly change, the intended config
is effected transparently (w/out the need to re-write values by user
again).

So I appreciate that cpuset.cpus.exclusive writes fail early -- for
sibling conflicts -- otherwise the order of creation would need to be
evaluated post hoc.
For illustration:
	a1/cpuset.cpus.exclusive=0,1
	a2/cpuset.cpus.exclusive=1,2
	a3/cpuset.cpus.exclusive=1,3
If this was allowed and a1 was rmdir'd, the (new) resolution of conflict
between a2 and a3 would need to determine which of a2, a3 was created
first.

HTH,
Michal

Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Chen Ridong 1 month ago


On 2026/1/5 5:48, Waiman Long wrote:
> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>
>> On 2026/1/2 3:15, Waiman Long wrote:
>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>> until valid partition") introduced a new check to disallow the setting
>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>> cpuset.cpus value so that there will at least be one CPU left in the
>>> sibling in case the cpuset becomes a valid partition root. This new
>>> check does have the side effect of failing a cpuset.cpus change that
>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>
>>> With v2, users are supposed to be allowed to set whatever value they
>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>> not when cpuset.cpus is changed.
>>>
>> Hi, Longman,
>>
>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>> explicitly documented. Should we add it?
>>
>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>> constraints? This seems to be the underlying assumption in this patch.
> 
> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
> existing applications.
> 

There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
on device) and EBUSY.

I just want to ensure the behavior aligns with our design intent.

Consider this example:

# cd /sys/fs/cgroup/
# mkdir test
# echo 1 > test/cpuset.cpus
# echo $$ > test/cgroup.procs
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo > test/cpuset.cpus
-bash: echo: write error: No space left on device

In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
populated?

> The exclusive CPU constraint does not apply to cpuset.cpus. It only applies when setting
> cpuset.cpus.exclusive wrt to other cpuset.cpus.exclusive* in sibling cpusets. So I will not say one
> has higher priority than the other.
> 
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong

Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Waiman Long 1 month ago

On 1/4/26 8:35 PM, Chen Ridong wrote:
>
> On 2026/1/5 5:48, Waiman Long wrote:
>> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>>> until valid partition") introduced a new check to disallow the setting
>>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>>> cpuset.cpus value so that there will at least be one CPU left in the
>>>> sibling in case the cpuset becomes a valid partition root. This new
>>>> check does have the side effect of failing a cpuset.cpus change that
>>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>>
>>>> With v2, users are supposed to be allowed to set whatever value they
>>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>>> not when cpuset.cpus is changed.
>>>>
>>> Hi, Longman,
>>>
>>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>>> explicitly documented. Should we add it?
>>>
>>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>>> constraints? This seems to be the underlying assumption in this patch.
>> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
>> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
>> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
>> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
>> existing applications.
>>
> There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
> on device) and EBUSY.
>
> I just want to ensure the behavior aligns with our design intent.
>
> Consider this example:
>
> # cd /sys/fs/cgroup/
> # mkdir test
> # echo 1 > test/cpuset.cpus
> # echo $$ > test/cgroup.procs
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> # echo > test/cpuset.cpus
> -bash: echo: write error: No space left on device
>
> In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
> question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
> populated?

Good catch. This error is for v1. It shouldn't apply for v2. Yes, I 
think we should fix that for v2.

Cheers,
Longman

Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Chen Ridong 1 month ago


On 2026/1/5 11:59, Waiman Long wrote:
> On 1/4/26 8:35 PM, Chen Ridong wrote:
>>
>> On 2026/1/5 5:48, Waiman Long wrote:
>>> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>>>> until valid partition") introduced a new check to disallow the setting
>>>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>>>> cpuset.cpus value so that there will at least be one CPU left in the
>>>>> sibling in case the cpuset becomes a valid partition root. This new
>>>>> check does have the side effect of failing a cpuset.cpus change that
>>>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>>>
>>>>> With v2, users are supposed to be allowed to set whatever value they
>>>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>>>> not when cpuset.cpus is changed.
>>>>>
>>>> Hi, Longman,
>>>>
>>>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>>>> explicitly documented. Should we add it?
>>>>
>>>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>>>> constraints? This seems to be the underlying assumption in this patch.
>>> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
>>> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
>>> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
>>> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
>>> existing applications.
>>>
>> There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
>> on device) and EBUSY.
>>
>> I just want to ensure the behavior aligns with our design intent.
>>
>> Consider this example:
>>
>> # cd /sys/fs/cgroup/
>> # mkdir test
>> # echo 1 > test/cpuset.cpus
>> # echo $$ > test/cgroup.procs
>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>> # echo > test/cpuset.cpus
>> -bash: echo: write error: No space left on device
>>
>> In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
>> question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
>> populated?
> 
> Good catch. This error is for v1. It shouldn't apply for v2. Yes, I think we should fix that for v2.
> 

The EBUSY check (through cpuset_cpumask_can_shrink) is necessary, correct?

Since the subsequent patch modifies exclusive checking for v1, should we consolidate all v1-related
code into a separate function like cpuset1_validate_change() (maybe come duplicate code)?, it would
allow us to isolate v1 logic and avoid having to account for v1 implementation details in future
features.

In other words:

validate_change(...)
{
    if (!is_in_v2_mode())
        return cpuset1_validate_change(cur, trial);
    ...
    // only v2 code here
}

-- 
Best regards,
Ridong

Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

Posted by Waiman Long 1 month ago

On 1/5/26 2:00 AM, Chen Ridong wrote:
>
> On 2026/1/5 11:59, Waiman Long wrote:
>> On 1/4/26 8:35 PM, Chen Ridong wrote:
>>> On 2026/1/5 5:48, Waiman Long wrote:
>>>> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>>>>> until valid partition") introduced a new check to disallow the setting
>>>>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>>>>> cpuset.cpus value so that there will at least be one CPU left in the
>>>>>> sibling in case the cpuset becomes a valid partition root. This new
>>>>>> check does have the side effect of failing a cpuset.cpus change that
>>>>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>>>>
>>>>>> With v2, users are supposed to be allowed to set whatever value they
>>>>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>>>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>>>>> not when cpuset.cpus is changed.
>>>>>>
>>>>> Hi, Longman,
>>>>>
>>>>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>>>>> explicitly documented. Should we add it?
>>>>>
>>>>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>>>>> constraints? This seems to be the underlying assumption in this patch.
>>>> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
>>>> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
>>>> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
>>>> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
>>>> existing applications.
>>>>
>>> There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
>>> on device) and EBUSY.
>>>
>>> I just want to ensure the behavior aligns with our design intent.
>>>
>>> Consider this example:
>>>
>>> # cd /sys/fs/cgroup/
>>> # mkdir test
>>> # echo 1 > test/cpuset.cpus
>>> # echo $$ > test/cgroup.procs
>>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>>> # echo > test/cpuset.cpus
>>> -bash: echo: write error: No space left on device
>>>
>>> In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
>>> question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
>>> populated?
>> Good catch. This error is for v1. It shouldn't apply for v2. Yes, I think we should fix that for v2.
>>
> The EBUSY check (through cpuset_cpumask_can_shrink) is necessary, correct?

Yes, it is a check needed by the deadline scheduler irrespective of if 
v1 or v2 is used.


>
> Since the subsequent patch modifies exclusive checking for v1, should we consolidate all v1-related
> code into a separate function like cpuset1_validate_change() (maybe come duplicate code)?, it would
> allow us to isolate v1 logic and avoid having to account for v1 implementation details in future
> features.
>
> In other words:
>
> validate_change(...)
> {
>      if (!is_in_v2_mode())
>          return cpuset1_validate_change(cur, trial);
>      ...
>      // only v2 code here
> }
>
Yes, we could move the code to cpuset1_validate_change().

Cheers,
Longman

cpuset1_validate_change

[cgroup/for-6.20 PATCH v2 1/4] cgroup/cpuset: Streamline rm_siblings_excl_cpus()
[cgroup/for-6.20 PATCH v2 2/4] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
[cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2
[cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict