[PATCH v2] cpuset: relax the overlap check for cgroup-v2

Sun Shaojie posted 1 patch 2 months, 3 weeks ago
There is a newer version of this series
kernel/cgroup/cpuset.c                            |  9 +++++++--
tools/testing/selftests/cgroup/test_cpuset_prs.sh | 10 +++++-----
2 files changed, 12 insertions(+), 7 deletions(-)
[PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Sun Shaojie 2 months, 3 weeks ago
In cgroup v2, a mutual overlap check is required when at least one of two
cpusets is exclusive. However, this check should be relaxed and limited to
cases where both cpusets are exclusive.

The table 1 shows the partition states of A1 and B1 after each step before
applying this patch.

Table 1: Before applying the patch
 Step                                       | A1's prstate | B1's prstate |
 #1> mkdir -p A1                            | member       |              |
 #2> echo "0-1" > A1/cpuset.cpus            | member       |              |
 #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
 #4> mkdir -p B1                            | root         | member       |
 #5> echo "0-3" > B1/cpuset.cpus            | root invalid | member       |
 #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |

After step #5, A1 changes from "root" to "root invalid" because its CPUs
(0-1) overlap with those requested by B1 (0-3). However, B1 can actually
use CPUs 2-3, so it would be more reasonable for A1 to remain as "root."

This patch relaxes the exclusive cpuset check for cgroup v2 while
preserving the current cgroup v1 behavior.

Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>

---
v1 -> v2:
  - Keeps the current cgroup v1 behavior unchanged
  - Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat.com
---
 kernel/cgroup/cpuset.c                            |  9 +++++++--
 tools/testing/selftests/cgroup/test_cpuset_prs.sh | 10 +++++-----
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 52468d2c178a..3240b3ab5998 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -592,8 +592,13 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
  */
 static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
 {
-	/* If either cpuset is exclusive, check if they are mutually exclusive */
-	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
+	/* If both cpusets are exclusive, check if they are mutually exclusive */
+	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
+		return !cpusets_are_exclusive(cs1, cs2);
+
+	/* In cgroup-v1, if either cpuset is exclusive, check if they are mutually exclusive */
+	if (!is_in_v2_mode() &&
+	    (is_cpu_exclusive(cs1) != is_cpu_exclusive(cs2)))
 		return !cpusets_are_exclusive(cs1, cs2);
 
 	/* Exclusive_cpus cannot intersect */
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a17256d9f88a..903dddfe88d7 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -269,7 +269,7 @@ TEST_MATRIX=(
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X3:P2    .      .     0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3  X2-3:P2   .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:C3 .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
-	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
+	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2 2-3"
 	" C0-3:S+ C1-3:S+ C2-3   C4-5     .      .      .      P2    0 B1:4-5 B1:P2 4-5"
 	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3  X2-3:P2   P2    0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
 	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3 X2-3:P2:C1-3 P2  0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
@@ -318,7 +318,7 @@ TEST_MATRIX=(
 	# Invalid to valid local partition direct transition tests
 	" C1-3:S+:P2 X4:P2  .      .      .      .      .      .     0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
 	" C1-3:S+:P2 X4:P2  .      .      .    X3:P2    .      .     0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
-	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
+	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:5-6 A1:P2|B1:P0 0-4"
 	"  C0-3:P2   .      .    C4-6 C0-4:C0-3  .      .      .     0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
 
 	# Local partition invalidation tests
@@ -388,10 +388,10 @@ TEST_MATRIX=(
 	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
 	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
 
-	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
-	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
+	# A non-exclusive cpuset.cpus change will not invalidate partition and its siblings
+	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:3 A1:P1|B1:P0"
 	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
-	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
+	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
 
 	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
 	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"
-- 
2.25.1
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/13 21:14, Sun Shaojie wrote:
> In cgroup v2, a mutual overlap check is required when at least one of two
> cpusets is exclusive. However, this check should be relaxed and limited to
> cases where both cpusets are exclusive.
> 
> The table 1 shows the partition states of A1 and B1 after each step before
> applying this patch.
> 
> Table 1: Before applying the patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> mkdir -p A1                            | member       |              |
>  #2> echo "0-1" > A1/cpuset.cpus            | member       |              |
>  #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>  #4> mkdir -p B1                            | root         | member       |
>  #5> echo "0-3" > B1/cpuset.cpus            | root invalid | member       |
>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
> 
> After step #5, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
> use CPUs 2-3, so it would be more reasonable for A1 to remain as "root."
> 
> This patch relaxes the exclusive cpuset check for cgroup v2 while
> preserving the current cgroup v1 behavior.
> 
> Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
> 
> ---
> v1 -> v2:
>   - Keeps the current cgroup v1 behavior unchanged
>   - Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat.com
> ---
>  kernel/cgroup/cpuset.c                            |  9 +++++++--
>  tools/testing/selftests/cgroup/test_cpuset_prs.sh | 10 +++++-----
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 52468d2c178a..3240b3ab5998 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -592,8 +592,13 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>   */
>  static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>  {
> -	/* If either cpuset is exclusive, check if they are mutually exclusive */
> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> +	/* If both cpusets are exclusive, check if they are mutually exclusive */
> +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
> +		return !cpusets_are_exclusive(cs1, cs2);
> +
> +	/* In cgroup-v1, if either cpuset is exclusive, check if they are mutually exclusive */
> +	if (!is_in_v2_mode() &&
> +	    (is_cpu_exclusive(cs1) != is_cpu_exclusive(cs2)))
>  		return !cpusets_are_exclusive(cs1, cs2);
>  

I prefer adding a helper function in the cpuset-v1.c file, similar to cpus_excl_conflict_legacy().

For cpuset v1, it can simply return cpus_excl_conflict_legacy(). It seems that other rules are not
relevant to v1.

>  	/* Exclusive_cpus cannot intersect */
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index a17256d9f88a..903dddfe88d7 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -269,7 +269,7 @@ TEST_MATRIX=(
>  	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X3:P2    .      .     0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
>  	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3  X2-3:P2   .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
>  	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:C3 .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> -	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
> +	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2 2-3"
>  	" C0-3:S+ C1-3:S+ C2-3   C4-5     .      .      .      P2    0 B1:4-5 B1:P2 4-5"
>  	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3  X2-3:P2   P2    0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
>  	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3 X2-3:P2:C1-3 P2  0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> @@ -318,7 +318,7 @@ TEST_MATRIX=(
>  	# Invalid to valid local partition direct transition tests
>  	" C1-3:S+:P2 X4:P2  .      .      .      .      .      .     0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
>  	" C1-3:S+:P2 X4:P2  .      .      .    X3:P2    .      .     0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
> -	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
> +	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:5-6 A1:P2|B1:P0 0-4"
>  	"  C0-3:P2   .      .    C4-6 C0-4:C0-3  .      .      .     0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
>  
>  	# Local partition invalidation tests
> @@ -388,10 +388,10 @@ TEST_MATRIX=(
>  	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
>  	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
>  
> -	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
> -	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
> +	# A non-exclusive cpuset.cpus change will not invalidate partition and its siblings
> +	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:3 A1:P1|B1:P0"
>  	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
> -	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
> +	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
>  
>  	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
>  	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"

-- 
Best regards,
Ridong
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Sun Shaojie 2 months, 3 weeks ago
On 2015/11/15 08:58, Chen Ridong wrote:
>On 2025/11/15 0:14, Michal Koutný wrote:
>> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>>> After further consideration, I still suggest retaining this rule.
>> 
>> Apologies, I'm slightly lost which rule. I hope the new iteration from
>> Shaojie with both before/after tables will explain it.
>> 
>
>The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
>exclusive" to
>"If both cpusets are exclusive, check if they are mutually exclusive"
>
>  -    /* If either cpuset is exclusive, check if they are mutually exclusive */
>  -    if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>  +    /* If both cpusets are exclusive, check if they are mutually exclusive */
>  +    if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>  +        return !cpusets_are_exclusive(cs1, cs2);
>
>I suggest not modifying this rule and keeping the original logic intact:
>
>>> For am example:
>>>   Step                                       | A1's prstate | B1's prstate |
>>>   #1> mkdir -p A1                            | member       |              |
>>>   #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
>>>   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>>   #4> mkdir -p B1                            | root         | member       |
>>>   #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>>
>>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>>> invalidated?
>> 
>> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
>> anymore.
>> 
>>> B1 could also use the parent's effective CPUs, right?
>> 
>> Here you assume some ordering between siblings treating A1 more
>> important than B1. But it's symmetrical in principle, no?
>> 
>
>I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
>following the same logic—but I’m not in favor of doing so.

Hi, Ridong,

Thank you for pointing out the issue with the current patch; this is indeed
not what our product intends. I must admit that I haven't thoroughly tested
on such recent kernel versions.

Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
"other rules" you mentioned, we do not intend to relax them. On the 
contrary, we aim to maintain them firmly.

Our product need ensure the following behavior: in cgroup-v2, user 
modifications to one cpuset should not affect the partition state of its 
sibling cpusets. This is justified and meaningful, as it aligns with the 
isolation characteristics of cgroups.

This can be divided into two scenarios:
Scenario 1: Only one of A1 and B1 is "root".
Scenario 2: Both A1 and B1 are "root".

We plan to implement Scenario 1 first. This is the goal of patch v2.
However, patch v2 is flawed because it does not strictly adhere to the 
following existing rule.

However, it is worth noting that the current cgroup v2 implementation does 
not strictly adhere to the following rule either (which is also an 
objective for patch v3 to address).

Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".

Using your example to illustrate.
 Step (refer to the steps in the table below)
 #1> mkdir -p A1                           
 #2> echo "0-1" > A1/cpuset.cpus.exclusive 
 #3> echo "root" > A1/cpuset.cpus.partition
 #4> mkdir -p B1               
 #5> echo "0" > B1/cpuset.cpus 

Table 1: Current result
 Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
 #1   | 0      |                |           | member       |              |
 #2   | 0      | 0-1            |           | member       |              |
 #3   | 0      | 0-1            |           | root         |              |
 #4   | 0      | 0-1            |           | root         | member       |
 #5   | 0      | 0-1            | 0         | root invalid | member       |

Table 2: Expected result
 Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
 #1   | 0      |                |           | member       |              |
 #2   | 0      | 0-1            |           | member       |              |
 #3   | 0      | 0-1            |           | root         |              |
 #4   | 0      | 0-1            |           | root         | member       |
 #5   | error  | 0-1            |           | root         | member       |

Currently, after step #5, the operation returns success, which clearly 
violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's 
"cpuset.cpus.exclusive".

Therefore, after step #5, the operation should return error, with A1 
remaining as "root". This better complies with the Rule 1.

------
The following content is provided for reference, and we hope it may be 
adopted in the future.
!!These are not part of what patch v3 will implement.

As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current 
cgroup v2 behavior. This patch series does not modify it, but we hope to 
draw the maintainers' attention, as we indeed have plans for future 
modifications. Our intent can be seen from the following examples.

For example:
 Step (refer to the steps in the table below)
 #1> mkdir -p A1                           
 #2> echo "0-1"  > A1/cpuset.cpus 
 #3> echo "root" > A1/cpuset.cpus.partition
 #4> mkdir -p B1               
 #5> echo "2-3"  > B1/cpuset.cpus 
 #6> echo "root" > B1/cpuset.cpus.partition
 #7> echo "1-2"  > B1/cpuset.cpus

Table 1: Current result
 Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
 #1   | from parent   |               | member       |              |
 #2   | 0-1           |               | member       |              |
 #3   | 0-1           |               | root         |              |
 #4   | 0-1           | from parent   | root         | member       |
 #5   | 0-1           | 2-3           | root         | member       |
 #6   | 0-1           | 2-3           | root         | root         |
 #7   | 0-1           | 1-2           | root invalid | root invalid |

Table 2: Expected result
 Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
 #1   | from parent   |               | member       |              |
 #2   | 0-1           |               | member       |              |
 #3   | 0-1           |               | root         |              |
 #4   | 0-1           | from parent   | root         | member       |
 #5   | 0-1           | 2-3           | root         | member       |
 #6   | 0-1           | 2-3           | root         | root         |
 #7   | 0-1           | 2             | root         | root invalid |

After step #7, we expect A1 to remain "root" (unaffected), while only B1 
becomes "root invalid".

 
The following Rule 2 and Rule 3 are alsomplemented and adhered to by our 
product. The current cgroup v2 implementation does not enforce them. 
Likewise, we hope this will draw the maintainers' attention. Maybe, they can
be applied in the future.

Rule 2: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.effective"
        must either be a subset of it, or "cpuset.cpus.effective" is null.

Rule 3: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.exclusive"
        must either be a subset of it, or "cpuset.cpus.exclusive" is null.

Rationale: "cpuset.cpus" represents the CPUs requested by the user, and the
        system should honor the user's intention.

---
Thanks,
Sun Shaojie


Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Waiman Long 2 months, 3 weeks ago
On 11/15/25 1:02 AM, Sun Shaojie wrote:
> On 2015/11/15 08:58, Chen Ridong wrote:
>> On 2025/11/15 0:14, Michal Koutný wrote:
>>> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>> After further consideration, I still suggest retaining this rule.
>>> Apologies, I'm slightly lost which rule. I hope the new iteration from
>>> Shaojie with both before/after tables will explain it.
>>>
>> The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
>> exclusive" to
>> "If both cpusets are exclusive, check if they are mutually exclusive"
>>
>>   -    /* If either cpuset is exclusive, check if they are mutually exclusive */
>>   -    if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>>   +    /* If both cpusets are exclusive, check if they are mutually exclusive */
>>   +    if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>>   +        return !cpusets_are_exclusive(cs1, cs2);
>>
>> I suggest not modifying this rule and keeping the original logic intact:
>>
>>>> For am example:
>>>>    Step                                       | A1's prstate | B1's prstate |
>>>>    #1> mkdir -p A1                            | member       |              |
>>>>    #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
>>>>    #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>>>    #4> mkdir -p B1                            | root         | member       |
>>>>    #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>>>
>>>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>>>> invalidated?
>>> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
>>> anymore.
>>>
>>>> B1 could also use the parent's effective CPUs, right?
>>> Here you assume some ordering between siblings treating A1 more
>>> important than B1. But it's symmetrical in principle, no?
>>>
>> I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
>> following the same logic—but I’m not in favor of doing so.
> Hi, Ridong,
>
> Thank you for pointing out the issue with the current patch; this is indeed
> not what our product intends. I must admit that I haven't thoroughly tested
> on such recent kernel versions.
>
> Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
> "other rules" you mentioned, we do not intend to relax them. On the
> contrary, we aim to maintain them firmly.
>
> Our product need ensure the following behavior: in cgroup-v2, user
> modifications to one cpuset should not affect the partition state of its
> sibling cpusets. This is justified and meaningful, as it aligns with the
> isolation characteristics of cgroups.
>
> This can be divided into two scenarios:
> Scenario 1: Only one of A1 and B1 is "root".
> Scenario 2: Both A1 and B1 are "root".
>
> We plan to implement Scenario 1 first. This is the goal of patch v2.
> However, patch v2 is flawed because it does not strictly adhere to the
> following existing rule.
>
> However, it is worth noting that the current cgroup v2 implementation does
> not strictly adhere to the following rule either (which is also an
> objective for patch v3 to address).
>
> Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".

Inside the cpuset code, the rule should be "cpuset.cpus should not be a 
subset of sibling's cpuset.cpus.exclusive".

Note that one rule that should always be followed in v2 is that writing 
any valid cpumask into cpuset.cpus is allowed, but writing to 
cpuset.cpus.exclusive may fail if some rules are violated. If the new 
cpuset.cpus violate the rules for a sibling partition root, the current 
code will invalidate the sibling partition. I am not against changing 
the cpuset.cpus.effective to a suitable value to avoid the conflict 
instead invalidating a sibling partition. We do need to make sure that 
the new behavior is consistent under different circumstances.

>
> Using your example to illustrate.
>   Step (refer to the steps in the table below)
>   #1> mkdir -p A1
>   #2> echo "0-1" > A1/cpuset.cpus.exclusive
>   #3> echo "root" > A1/cpuset.cpus.partition
>   #4> mkdir -p B1
>   #5> echo "0" > B1/cpuset.cpus
>
> Table 1: Current result
>   Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>   #1   | 0      |                |           | member       |              |
>   #2   | 0      | 0-1            |           | member       |              |
>   #3   | 0      | 0-1            |           | root         |              |
>   #4   | 0      | 0-1            |           | root         | member       |
>   #5   | 0      | 0-1            | 0         | root invalid | member       |
>
> Table 2: Expected result
>   Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>   #1   | 0      |                |           | member       |              |
>   #2   | 0      | 0-1            |           | member       |              |
>   #3   | 0      | 0-1            |           | root         |              |
>   #4   | 0      | 0-1            |           | root         | member       |
>   #5   | error  | 0-1            |           | root         | member       |
>
> Currently, after step #5, the operation returns success, which clearly
> violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's
> "cpuset.cpus.exclusive".
>
> Therefore, after step #5, the operation should return error, with A1
> remaining as "root". This better complies with the Rule 1.
>
> ------
> The following content is provided for reference, and we hope it may be
> adopted in the future.
> !!These are not part of what patch v3 will implement.
>
> As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current
> cgroup v2 behavior. This patch series does not modify it, but we hope to
> draw the maintainers' attention, as we indeed have plans for future
> modifications. Our intent can be seen from the following examples.
>
> For example:
>   Step (refer to the steps in the table below)
>   #1> mkdir -p A1
>   #2> echo "0-1"  > A1/cpuset.cpus
>   #3> echo "root" > A1/cpuset.cpus.partition
>   #4> mkdir -p B1
>   #5> echo "2-3"  > B1/cpuset.cpus
>   #6> echo "root" > B1/cpuset.cpus.partition
>   #7> echo "1-2"  > B1/cpuset.cpus
>
> Table 1: Current result
>   Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
>   #1   | from parent   |               | member       |              |
>   #2   | 0-1           |               | member       |              |
>   #3   | 0-1           |               | root         |              |
>   #4   | 0-1           | from parent   | root         | member       |
>   #5   | 0-1           | 2-3           | root         | member       |
>   #6   | 0-1           | 2-3           | root         | root         |
>   #7   | 0-1           | 1-2           | root invalid | root invalid |
>
> Table 2: Expected result
>   Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
>   #1   | from parent   |               | member       |              |
>   #2   | 0-1           |               | member       |              |
>   #3   | 0-1           |               | root         |              |
>   #4   | 0-1           | from parent   | root         | member       |
>   #5   | 0-1           | 2-3           | root         | member       |
>   #6   | 0-1           | 2-3           | root         | root         |
>   #7   | 0-1           | 2             | root         | root invalid |
>
> After step #7, we expect A1 to remain "root" (unaffected), while only B1
> becomes "root invalid".
>
>   
> The following Rule 2 and Rule 3 are alsomplemented and adhered to by our
> product. The current cgroup v2 implementation does not enforce them.
> Likewise, we hope this will draw the maintainers' attention. Maybe, they can
> be applied in the future.
>
> Rule 2: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.effective"
>          must either be a subset of it, or "cpuset.cpus.effective" is null.

cpuset.cpus.effective will never be null in v2 with the exception of a 
partition root distributing out all its CPUs to child sub-partitions.


>
> Rule 3: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.exclusive"
>          must either be a subset of it, or "cpuset.cpus.exclusive" is null.

We currently don't have this rule in the cpuset code as 
cpuset.cpus.exclusive can be independent of "cpuset.cpus".  We could 
implement this rule by failing the cpuset.cpus.exclusive write in this 
case, but we can't fail the write to "cpuset.cpus" if the rule is violated.

Cheers,
Longman

Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/15 14:02, Sun Shaojie wrote:
> On 2015/11/15 08:58, Chen Ridong wrote:
>> On 2025/11/15 0:14, Michal Koutný wrote:
>>> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>> After further consideration, I still suggest retaining this rule.
>>>
>>> Apologies, I'm slightly lost which rule. I hope the new iteration from
>>> Shaojie with both before/after tables will explain it.
>>>
>>
>> The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
>> exclusive" to
>> "If both cpusets are exclusive, check if they are mutually exclusive"
>>
>>  -    /* If either cpuset is exclusive, check if they are mutually exclusive */
>>  -    if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>>  +    /* If both cpusets are exclusive, check if they are mutually exclusive */
>>  +    if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>>  +        return !cpusets_are_exclusive(cs1, cs2);
>>
>> I suggest not modifying this rule and keeping the original logic intact:
>>
>>>> For am example:
>>>>   Step                                       | A1's prstate | B1's prstate |
>>>>   #1> mkdir -p A1                            | member       |              |
>>>>   #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
>>>>   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>>>   #4> mkdir -p B1                            | root         | member       |
>>>>   #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>>>
>>>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>>>> invalidated?
>>>
>>> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
>>> anymore.
>>>
>>>> B1 could also use the parent's effective CPUs, right?
>>>
>>> Here you assume some ordering between siblings treating A1 more
>>> important than B1. But it's symmetrical in principle, no?
>>>
>>
>> I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
>> following the same logic—but I’m not in favor of doing so.
> 
> Hi, Ridong,
> 
> Thank you for pointing out the issue with the current patch; this is indeed
> not what our product intends. I must admit that I haven't thoroughly tested
> on such recent kernel versions.
> 
> Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
> "other rules" you mentioned, we do not intend to relax them. On the 
> contrary, we aim to maintain them firmly.
> 
> Our product need ensure the following behavior: in cgroup-v2, user 
> modifications to one cpuset should not affect the partition state of its 
> sibling cpusets. This is justified and meaningful, as it aligns with the 
> isolation characteristics of cgroups.
> 

This is ideal in theory, but I don’t think it’s practical in reality.

> This can be divided into two scenarios:
> Scenario 1: Only one of A1 and B1 is "root".
> Scenario 2: Both A1 and B1 are "root".
> 
> We plan to implement Scenario 1 first. This is the goal of patch v2.
> However, patch v2 is flawed because it does not strictly adhere to the 
> following existing rule.
> 
> However, it is worth noting that the current cgroup v2 implementation does 
> not strictly adhere to the following rule either (which is also an 
> objective for patch v3 to address).
> 
> Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".
> 
> Using your example to illustrate.
>  Step (refer to the steps in the table below)
>  #1> mkdir -p A1                           
>  #2> echo "0-1" > A1/cpuset.cpus.exclusive 
>  #3> echo "root" > A1/cpuset.cpus.partition
>  #4> mkdir -p B1               
>  #5> echo "0" > B1/cpuset.cpus 
> 
> Table 1: Current result
>  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>  #1   | 0      |                |           | member       |              |
>  #2   | 0      | 0-1            |           | member       |              |
>  #3   | 0      | 0-1            |           | root         |              |
>  #4   | 0      | 0-1            |           | root         | member       |
>  #5   | 0      | 0-1            | 0         | root invalid | member       |
> 

I think this what we expect.

> Table 2: Expected result
>  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>  #1   | 0      |                |           | member       |              |
>  #2   | 0      | 0-1            |           | member       |              |
>  #3   | 0      | 0-1            |           | root         |              |
>  #4   | 0      | 0-1            |           | root         | member       |
>  #5   | error  | 0-1            |           | root         | member       |
> 

Step 5 should not return an error. As Longman pointed out, in cgroup-v2, setting cpuset.cpus should
never fail.

> Currently, after step #5, the operation returns success, which clearly 
> violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's 
> "cpuset.cpus.exclusive".
> 
> Therefore, after step #5, the operation should return error, with A1 
> remaining as "root". This better complies with the Rule 1.
> 

This is an exclusivity rule. Since it violates the exclusivity rules, A1 should be invalidated.

> ------
> The following content is provided for reference, and we hope it may be 
> adopted in the future.
> !!These are not part of what patch v3 will implement.
> 
> As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current 
> cgroup v2 behavior. This patch series does not modify it, but we hope to 
> draw the maintainers' attention, as we indeed have plans for future 
> modifications. Our intent can be seen from the following examples.
> 
> For example:
>  Step (refer to the steps in the table below)
>  #1> mkdir -p A1                           
>  #2> echo "0-1"  > A1/cpuset.cpus 
>  #3> echo "root" > A1/cpuset.cpus.partition
>  #4> mkdir -p B1               
>  #5> echo "2-3"  > B1/cpuset.cpus 
>  #6> echo "root" > B1/cpuset.cpus.partition
>  #7> echo "1-2"  > B1/cpuset.cpus
> 
> Table 1: Current result
>  Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
>  #1   | from parent   |               | member       |              |
>  #2   | 0-1           |               | member       |              |
>  #3   | 0-1           |               | root         |              |
>  #4   | 0-1           | from parent   | root         | member       |
>  #5   | 0-1           | 2-3           | root         | member       |
>  #6   | 0-1           | 2-3           | root         | root         |
>  #7   | 0-1           | 1-2           | root invalid | root invalid |
> 
> Table 2: Expected result
>  Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
>  #1   | from parent   |               | member       |              |
>  #2   | 0-1           |               | member       |              |
>  #3   | 0-1           |               | root         |              |
>  #4   | 0-1           | from parent   | root         | member       |
>  #5   | 0-1           | 2-3           | root         | member       |
>  #6   | 0-1           | 2-3           | root         | root         |
>  #7   | 0-1           | 2             | root         | root invalid |
> 
> After step #7, we expect A1 to remain "root" (unaffected), while only B1 
> becomes "root invalid".
> 

With the result you expect, would we observe the following behaviors:

#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "1-2"  > B1/cpuset.cpus
#5> echo "root" > A1/cpuset.cpus.partition
#6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid

#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "1-2"  > B1/cpuset.cpus
#5> echo "root" > B1/cpuset.cpus.partition
#6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root

Do different operation orders yield different results? If so, this is not what we expect.

-- 
Best regards,
Ridong

Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Sun Shaojie 2 months, 3 weeks ago
On 2025/11/14 08:50, Chen Ridong Wrote:
>On 2025/11/13 21:14, Sun Shaojie wrote:
>> ...
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 52468d2c178a..3240b3ab5998 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -592,8 +592,13 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>>   */
>>  static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>>  {
>> -	/* If either cpuset is exclusive, check if they are mutually exclusive */
>> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>> +	/* If both cpusets are exclusive, check if they are mutually exclusive */
>> +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>> +		return !cpusets_are_exclusive(cs1, cs2);
>> +
>> +	/* In cgroup-v1, if either cpuset is exclusive, check if they are mutually exclusive */
>> +	if (!is_in_v2_mode() &&
>> +	    (is_cpu_exclusive(cs1) != is_cpu_exclusive(cs2)))
>>  		return !cpusets_are_exclusive(cs1, cs2);
>>  
>
>I prefer adding a helper function in the cpuset-v1.c file, similar to cpus_excl_conflict_legacy().
>
>For cpuset v1, it can simply return cpus_excl_conflict_legacy(). It seems that other rules are not
>relevant to v1.
>
>>  	/* Exclusive_cpus cannot intersect */

Hi, Ridong,

Thank you for the suggestion.I will update the patch accordingly.

Thanks,
Sunshaojie
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/14 14:33, Sun Shaojie wrote:
> On 2025/11/14 08:50, Chen Ridong Wrote:
>> On 2025/11/13 21:14, Sun Shaojie wrote:
>>> ...
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 52468d2c178a..3240b3ab5998 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -592,8 +592,13 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>>>   */
>>>  static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>>>  {
>>> -	/* If either cpuset is exclusive, check if they are mutually exclusive */
>>> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>>> +	/* If both cpusets are exclusive, check if they are mutually exclusive */
>>> +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>>> +		return !cpusets_are_exclusive(cs1, cs2);
>>> +
>>> +	/* In cgroup-v1, if either cpuset is exclusive, check if they are mutually exclusive */
>>> +	if (!is_in_v2_mode() &&
>>> +	    (is_cpu_exclusive(cs1) != is_cpu_exclusive(cs2)))
>>>  		return !cpusets_are_exclusive(cs1, cs2);
>>>  
>>
>> I prefer adding a helper function in the cpuset-v1.c file, similar to cpus_excl_conflict_legacy().
>>
>> For cpuset v1, it can simply return cpus_excl_conflict_legacy(). It seems that other rules are not
>> relevant to v1.
>>
>>>  	/* Exclusive_cpus cannot intersect */
> 
> Hi, Ridong,
> 
> Thank you for the suggestion.I will update the patch accordingly.
> 

If we are ready to relax this rule, adding the v1 logic in cpuset1_validate_change might be
appropriate. However, as I mentioned in my reply to Michal, I believe further discussion is needed.

-- 
Best regards,
Ridong
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Sun Shaojie 2 months, 3 weeks ago
Hi, Michal

On 2025/11/14 01:07, Michal Koutný wrote:
>On Thu, Nov 13, 2025 at 09:14:34PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> In cgroup v2, a mutual overlap check is required when at least one of two
>> cpusets is exclusive. However, this check should be relaxed and limited to
>> cases where both cpusets are exclusive.
>> 
>> The table 1 shows the partition states of A1 and B1 after each step before
>> applying this patch.
>> 
>> Table 1: Before applying the patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> mkdir -p A1                            | member       |              |
>>  #2> echo "0-1" > A1/cpuset.cpus            | member       |              |
>>  #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>  #4> mkdir -p B1                            | root         | member       |
>>  #5> echo "0-3" > B1/cpuset.cpus            | root invalid | member       |
>>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
>> 
>> After step #5, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
>> use CPUs 2-3, so it would be more reasonable for A1 to remain as "root."
>
>I remember there was the addition of cgroup_file_notify() for the
>cpuset.cpus.partition so that such changes can be watched for.
>
>I may not be seeing whole picture, so I ask -- why would it be "more
>reasonable" for A1 to remain root. From this description it looks like
>you'd silently convert B1's effective cpus to 2-3 but IIUC the code
>change that won't happen but you'd reject the write of "0-3" instead.
>

The desired outcome is that after step #5, although B1 writes "0-3" to 
cpuset.cpus, A1 can still remain as "root", and B1 ends up with effective 
CPUs of 2-3. In summary, We want to avoid A1's invalidation when B1 
changes its cpuset.cpus. Because cgroup v2 allows the effective CPU mask 
of a cpuset to differ from its requested mask.

Indeed, this issue was discussed in detail during the v1 review.
https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat.com/T/#u

>Isn't here missing Table 2: After applying the patch? I'm asking because
>of the number 1 but also because it'd make the intention clearer
>;-), perhaps with a column for cpuset.cpus.effective.

Thanks for the suggestion. I will update the patch description accordingly.

Thanks,
Sun Shaojie
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Michal Koutný 2 months, 3 weeks ago
On Fri, Nov 14, 2025 at 02:24:48PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> The desired outcome is that after step #5, although B1 writes "0-3" to 
> cpuset.cpus, A1 can still remain as "root", and B1 ends up with effective 
> CPUs of 2-3. In summary, We want to avoid A1's invalidation when B1 
> changes its cpuset.cpus. Because cgroup v2 allows the effective CPU mask 
> of a cpuset to differ from its requested mask.

So the new list of reasons why configured cpuset's cpus change are:
- hotplug,
- ancestor's config change,
- stealing by a sibling (new).

IIUC, the patch proposes this behavior:

  echo root >A1.cpuset.partition
  echo 0-1 >A1.cpuset.cpus
  
  echo root >B1.cpuset.partition
  echo 1-2 >B1.cpuset.cpus	# invalidates A1
  
  echo 0-1 >A1.cpuset.cpus	# invalidates B1
  
  ping-pong over CPU 1 ad libitum

I think the right (tm) behavior would be not to depend on the order in
which config is applied to siblings, i.e.

  echo root >A1.cpuset.partition
  echo 0-1 >A1.cpuset.cpus
  
  echo root >B1.cpuset.partition
  echo 1-2 >B1.cpuset.cpus	# invalidates both A1 and B1

  echo 0-1 >A1.cpuset.cpus	# no change anymore

(I hope my example sheds some light on my understanding of the situation
and desired behavior.)

Thanks,
Michal
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Sun Shaojie 2 months, 3 weeks ago
On 2025/11/15 0:15, Michal Koutný wrote:
>On Fri, Nov 14, 2025 at 02:24:48PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> The desired outcome is that after step #5, although B1 writes "0-3" to 
>> cpuset.cpus, A1 can still remain as "root", and B1 ends up with effective 
>> CPUs of 2-3. In summary, We want to avoid A1's invalidation when B1 
>> changes its cpuset.cpus. Because cgroup v2 allows the effective CPU mask 
>> of a cpuset to differ from its requested mask.
>
>So the new list of reasons why configured cpuset's cpus change are:
>- hotplug,
>- ancestor's config change,
>- stealing by a sibling (new).
>
>IIUC, the patch proposes this behavior:
>
>  echo root >A1.cpuset.partition
>  echo 0-1 >A1.cpuset.cpus
>  
>  echo root >B1.cpuset.partition
>  echo 1-2 >B1.cpuset.cpus	# invalidates A1
>  
>  echo 0-1 >A1.cpuset.cpus	# invalidates B1
>  
>  ping-pong over CPU 1 ad libitum
>
>I think the right (tm) behavior would be not to depend on the order in
>which config is applied to siblings, i.e.
>
>  echo root >A1.cpuset.partition
>  echo 0-1 >A1.cpuset.cpus
>  
>  echo root >B1.cpuset.partition
>  echo 1-2 >B1.cpuset.cpus	# invalidates both A1 and B1
>
>  echo 0-1 >A1.cpuset.cpus	# no change anymore
>
>(I hope my example sheds some light on my understanding of the situation
>and desired behavior.)

Hi, Michal

The current patch is flawed and will be fixed in patch v3. However, the 
example you provided also has issues. Below, I’ll explain your example.

Table 1: current result for your example 1.
 Step                                | A1's prstate | B1's prstate |
 #1> echo root > A1.cpuset.partition | root invalid |              |
 #2> echo 0-1 > A1.cpuset.cpus       | root         |              |
 #3> echo root > B1.cpuset.partition | root         | root invalid |
 #4> echo 1-2 > B1.cpuset.cpus       | root invalid | root invalid |
 #5> echo 0-1 >A1.cpuset.cpus        | root invalid | root invalid |

After step #4, both A1 and B1 are already in the "root invalid" state.
Therefore, B1 becoming "root invalid" is not caused by step #5, but was 
already in the "root invalid" state from the beginning.

Table 2: this is my expected result 
 Step                                | A1's prstate | B1's prstate |
 #1> echo root > A1.cpuset.partition | root invalid |              |
 #2> echo 0-1 > A1.cpuset.cpus       | root         |              |
 #3> echo root > B1.cpuset.partition | root         | root invalid |
 #4> echo 1-2 > B1.cpuset.cpus       | root         | root invalid |
 #5> echo 0-1 >A1.cpuset.cpus        | root         | root invalid |

If A1 is "root", and B1 is not "root", Our goal is to ensure that B1's 
behavior does not affect the "root" state of A1. Similarly, when B1's 
"cpuset.cpus.effective" is non-empty, we strive to ensure A1's own behavior
does not affect its "root" state as much as possible.

In summary, the purpose of submitting this patch is to ensure that, when 
only one of A1 and B1 is "root", the actions of one party do not affect the
"root" state of the other.

Thanks,
Sun Shaojie

Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/15 0:15, Michal Koutný wrote:
> On Fri, Nov 14, 2025 at 02:24:48PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> The desired outcome is that after step #5, although B1 writes "0-3" to 
>> cpuset.cpus, A1 can still remain as "root", and B1 ends up with effective 
>> CPUs of 2-3. In summary, We want to avoid A1's invalidation when B1 
>> changes its cpuset.cpus. Because cgroup v2 allows the effective CPU mask 
>> of a cpuset to differ from its requested mask.
> 
> So the new list of reasons why configured cpuset's cpus change are:
> - hotplug,
> - ancestor's config change,
> - stealing by a sibling (new).
> 
> IIUC, the patch proposes this behavior:
> 
>   echo root >A1.cpuset.partition
>   echo 0-1 >A1.cpuset.cpus
>   
>   echo root >B1.cpuset.partition
>   echo 1-2 >B1.cpuset.cpus	# invalidates A1
>   
>   echo 0-1 >A1.cpuset.cpus	# invalidates B1
>   
>   ping-pong over CPU 1 ad libitum
> 
> I think the right (tm) behavior would be not to depend on the order in
> which config is applied to siblings, i.e.
> 
>   echo root >A1.cpuset.partition
>   echo 0-1 >A1.cpuset.cpus
>   
>   echo root >B1.cpuset.partition
>   echo 1-2 >B1.cpuset.cpus	# invalidates both A1 and B1
> 
>   echo 0-1 >A1.cpuset.cpus	# no change anymore
> 
> (I hope my example sheds some light on my understanding of the situation
> and desired behavior.)

Before applying the patch, the behavior I got:

	# cd /sys/fs/cgroup/
	# mkdir A1
	# mkdir B1
	# echo root > A1/cpuset.cpus.partition
	# echo 0-1 > A1/cpuset.cpus
	# cat A1/cpuset.cpus.partition
	root
	# echo root > B1/cpuset.cpus.partition
	# echo 1-2 > B1/cpuset.cpus  # A1 is exclusive, invalidate both A1 and B1
	# cat A1/cpuset.cpus.partition
	root invalid
	# cat B1/cpuset.cpus.partition
	root invalid (cpuset.cpus and cpuset.cpus.exclusive are empty)
	# echo root > B1/cpuset.cpus.partition
	# cat B1/cpuset.cpus.partition
	root invalid (Cpu list in cpuset.cpus not exclusive)
	# echo root > A1/cpuset.cpus.partition
	# cat A1/cpuset.cpus.partition
	root invalid (Cpu list in cpuset.cpus not exclusive)
	#

After applying the patch, the behavior I got:

	# cd /sys/fs/cgroup/
	# mkdir A1
	#  mkdir B1
	# echo root > A1/cpuset.cpus.partition
	# echo 0-1 > A1/cpuset.cpus
	# cat A1/cpuset.cpus.partition
	root
	# echo root > B1/cpuset.cpus.partition
	# echo 1-2 > B1/cpuset.cpus # A1 is exclusive, B1 is going to be exclusive
	# cat A1/cpuset.cpus.partition
	root
	# cat B1/cpuset.cpus.partition # A1 and B1 should be invalid.
	root
	# echo member > B1/cpuset.cpus.partition
	# echo root > B1/cpuset.cpus.partition
	# cat A1/cpuset.cpus.partition
	root
	# cat B1/cpuset.cpus.partition
	root invalid (Cpu list in cpuset.cpus not exclusive)
	# echo member > A1/cpuset.cpus.partition
	# echo root > B1/cpuset.cpus.partition
	# echo root > A1/cpuset.cpus.partition
	# cat A1/cpuset.cpus.partition
	root invalid (Cpu list in cpuset.cpus not exclusive)
	# cat B1/cpuset.cpus.partition
	root

After applying the patch, The result is unexpected.

-- 
Best regards,
Ridong

Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/15 10:01, Chen Ridong wrote:
> 
> 
> On 2025/11/15 0:15, Michal Koutný wrote:
>> On Fri, Nov 14, 2025 at 02:24:48PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>>> The desired outcome is that after step #5, although B1 writes "0-3" to 
>>> cpuset.cpus, A1 can still remain as "root", and B1 ends up with effective 
>>> CPUs of 2-3. In summary, We want to avoid A1's invalidation when B1 
>>> changes its cpuset.cpus. Because cgroup v2 allows the effective CPU mask 
>>> of a cpuset to differ from its requested mask.
>>
>> So the new list of reasons why configured cpuset's cpus change are:
>> - hotplug,
>> - ancestor's config change,
>> - stealing by a sibling (new).
>>
>> IIUC, the patch proposes this behavior:
>>
>>   echo root >A1.cpuset.partition
>>   echo 0-1 >A1.cpuset.cpus
>>   
>>   echo root >B1.cpuset.partition
>>   echo 1-2 >B1.cpuset.cpus	# invalidates A1
>>   
>>   echo 0-1 >A1.cpuset.cpus	# invalidates B1
>>   
>>   ping-pong over CPU 1 ad libitum
>>
>> I think the right (tm) behavior would be not to depend on the order in
>> which config is applied to siblings, i.e.
>>
>>   echo root >A1.cpuset.partition
>>   echo 0-1 >A1.cpuset.cpus
>>   
>>   echo root >B1.cpuset.partition
>>   echo 1-2 >B1.cpuset.cpus	# invalidates both A1 and B1
>>
>>   echo 0-1 >A1.cpuset.cpus	# no change anymore
>>
>> (I hope my example sheds some light on my understanding of the situation
>> and desired behavior.)
> 
> Before applying the patch, the behavior I got:
> 
> 	# cd /sys/fs/cgroup/
> 	# mkdir A1
> 	# mkdir B1
> 	# echo root > A1/cpuset.cpus.partition
> 	# echo 0-1 > A1/cpuset.cpus
> 	# cat A1/cpuset.cpus.partition
> 	root
> 	# echo root > B1/cpuset.cpus.partition
> 	# echo 1-2 > B1/cpuset.cpus  # A1 is exclusive, invalidate both A1 and B1
> 	# cat A1/cpuset.cpus.partition
> 	root invalid
> 	# cat B1/cpuset.cpus.partition
> 	root invalid (cpuset.cpus and cpuset.cpus.exclusive are empty)
> 	# echo root > B1/cpuset.cpus.partition
> 	# cat B1/cpuset.cpus.partition
> 	root invalid (Cpu list in cpuset.cpus not exclusive)
> 	# echo root > A1/cpuset.cpus.partition
> 	# cat A1/cpuset.cpus.partition
> 	root invalid (Cpu list in cpuset.cpus not exclusive)
> 	#
> 
> After applying the patch, the behavior I got:
> 
> 	# cd /sys/fs/cgroup/
> 	# mkdir A1
> 	#  mkdir B1
> 	# echo root > A1/cpuset.cpus.partition
> 	# echo 0-1 > A1/cpuset.cpus
> 	# cat A1/cpuset.cpus.partition
> 	root
> 	# echo root > B1/cpuset.cpus.partition
> 	# echo 1-2 > B1/cpuset.cpus # A1 is exclusive, B1 is going to be exclusive
> 	# cat A1/cpuset.cpus.partition
> 	root
> 	# cat B1/cpuset.cpus.partition # A1 and B1 should be invalid.
> 	root
> 	# echo member > B1/cpuset.cpus.partition
> 	# echo root > B1/cpuset.cpus.partition
> 	# cat A1/cpuset.cpus.partition
> 	root
> 	# cat B1/cpuset.cpus.partition
> 	root invalid (Cpu list in cpuset.cpus not exclusive)
> 	# echo member > A1/cpuset.cpus.partition
> 	# echo root > B1/cpuset.cpus.partition
> 	# echo root > A1/cpuset.cpus.partition
> 	# cat A1/cpuset.cpus.partition
> 	root invalid (Cpu list in cpuset.cpus not exclusive)
> 	# cat B1/cpuset.cpus.partition
> 	root
> 
> After applying the patch, The result is unexpected.
> 

This may trigger another related corner case, I sent a patch to fix it:

https://lore.kernel.org/cgroups/20251115093140.1121329-1-chenridong@huaweicloud.com/T/#mfc4157e23d253b71ef9a2cfa5cb54bf41449840c

-- 
Best regards,
Ridong

[PATCH -next] cpuset: treate root invalid trialcs as exclusive
Posted by Chen Ridong 2 months, 3 weeks ago
From: Chen Ridong <chenridong@huawei.com>

A test scenario revealed inconsistent results based on operation order:
Scenario 1:
	#cd /sys/fs/cgroup/
	#mkdir A1
	#mkdir B1
	#echo 1-2 > B1/cpuset.cpus
	#echo 0-1 > A1/cpuset.cpus
	#echo root > A1/cpuset.cpus.partition
	#cat A1/cpuset.cpus.partition
	root invalid (Cpu list in cpuset.cpus not exclusive)

Scenario 2:
	#cd /sys/fs/cgroup/
	#mkdir A1
	#mkdir B1
	#echo 1-2 > B1/cpuset.cpus
	#echo root > A1/cpuset.cpus.partition
	#echo 0-1 > A1/cpuset.cpus
	#cat A1/cpuset.cpus.partition
	root

The second scenario produces an unexpected result: A1 should be marked
as invalid but is incorrectly recognized as valid. This occurs because
when validate_change is invoked, A1 (in root-invalid state) may
automatically transition to a valid partition, with non-exclusive state
checks against siblings, leading to incorrect validation.

To fix this inconsistency, treat trialcs in root-invalid state as exclusive
during validation and set the corresponding exclusive flags, ensuring
consistent behavior regardless of operation order.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index daf813386260..a189f356b5f1 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2526,6 +2526,18 @@ static void partition_cpus_change(struct cpuset *cs, struct cpuset *trialcs,
 	}
 }
 
+static int init_trialcs(struct cpuset *cs, struct cpuset *trialcs)
+{
+	trialcs->prs_err = PERR_NONE;
+	/*
+	 * If partition_root_state != 0, it may automatically change to a partition,
+	 * Therefore, we should treat trialcs as exclusive during validation
+	 */
+	if (trialcs->partition_root_state)
+		set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+	return compute_trialcs_excpus(trialcs, cs);
+}
+
 /**
  * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
  * @cs: the cpuset to consider
@@ -2551,9 +2563,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (alloc_tmpmasks(&tmp))
 		return -ENOMEM;
 
-	compute_trialcs_excpus(trialcs, cs);
-	trialcs->prs_err = PERR_NONE;
-
+	init_trialcs(cs, trialcs);
 	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
 	if (retval < 0)
 		goto out_free;
@@ -2612,7 +2622,7 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	 * Reject the change if there is exclusive CPUs conflict with
 	 * the siblings.
 	 */
-	if (compute_trialcs_excpus(trialcs, cs))
+	if (init_trialcs(cs, trialcs))
 		return -EINVAL;
 
 	/*
@@ -2628,7 +2638,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (alloc_tmpmasks(&tmp))
 		return -ENOMEM;
 
-	trialcs->prs_err = PERR_NONE;
 	partition_cpus_change(cs, trialcs, &tmp);
 
 	spin_lock_irq(&callback_lock);
-- 
2.34.1
Re: [PATCH -next] cpuset: treate root invalid trialcs as exclusive
Posted by Waiman Long 2 months, 3 weeks ago
On 11/15/25 4:31 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> A test scenario revealed inconsistent results based on operation order:
> Scenario 1:
> 	#cd /sys/fs/cgroup/
> 	#mkdir A1
> 	#mkdir B1
> 	#echo 1-2 > B1/cpuset.cpus
> 	#echo 0-1 > A1/cpuset.cpus
> 	#echo root > A1/cpuset.cpus.partition
> 	#cat A1/cpuset.cpus.partition
> 	root invalid (Cpu list in cpuset.cpus not exclusive)
>
> Scenario 2:
> 	#cd /sys/fs/cgroup/
> 	#mkdir A1
> 	#mkdir B1
> 	#echo 1-2 > B1/cpuset.cpus
> 	#echo root > A1/cpuset.cpus.partition
> 	#echo 0-1 > A1/cpuset.cpus
> 	#cat A1/cpuset.cpus.partition
> 	root
>
> The second scenario produces an unexpected result: A1 should be marked
> as invalid but is incorrectly recognized as valid. This occurs because
> when validate_change is invoked, A1 (in root-invalid state) may
> automatically transition to a valid partition, with non-exclusive state
> checks against siblings, leading to incorrect validation.
>
> To fix this inconsistency, treat trialcs in root-invalid state as exclusive
> during validation and set the corresponding exclusive flags, ensuring
> consistent behavior regardless of operation order.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   kernel/cgroup/cpuset.c | 19 ++++++++++++++-----
>   1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index daf813386260..a189f356b5f1 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2526,6 +2526,18 @@ static void partition_cpus_change(struct cpuset *cs, struct cpuset *trialcs,
>   	}
>   }
>   
> +static int init_trialcs(struct cpuset *cs, struct cpuset *trialcs)
> +{
> +	trialcs->prs_err = PERR_NONE;
> +	/*
> +	 * If partition_root_state != 0, it may automatically change to a partition,
> +	 * Therefore, we should treat trialcs as exclusive during validation
> +	 */
> +	if (trialcs->partition_root_state)
> +		set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
Nit: We usually use the non-atomic version __set_bit() if concurrent 
access isn't possible which is true in this case.

> +	return compute_trialcs_excpus(trialcs, cs);
> +}
> +
>   /**
>    * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
>    * @cs: the cpuset to consider
> @@ -2551,9 +2563,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>   	if (alloc_tmpmasks(&tmp))
>   		return -ENOMEM;
>   
> -	compute_trialcs_excpus(trialcs, cs);
> -	trialcs->prs_err = PERR_NONE;
> -
> +	init_trialcs(cs, trialcs);
>   	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
>   	if (retval < 0)
>   		goto out_free;
> @@ -2612,7 +2622,7 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>   	 * Reject the change if there is exclusive CPUs conflict with
>   	 * the siblings.
>   	 */
> -	if (compute_trialcs_excpus(trialcs, cs))
> +	if (init_trialcs(cs, trialcs))
>   		return -EINVAL;
>   
>   	/*
> @@ -2628,7 +2638,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>   	if (alloc_tmpmasks(&tmp))
>   		return -ENOMEM;
>   
> -	trialcs->prs_err = PERR_NONE;
>   	partition_cpus_change(cs, trialcs, &tmp);
>   
>   	spin_lock_irq(&callback_lock);
Acked-by: Waiman Long <longman@redhat.com>
Re: [PATCH -next] cpuset: treate root invalid trialcs as exclusive
Posted by Sun Shaojie 2 months, 3 weeks ago
On 2025/11/15 09:31, Chen Ridong wrote:
>A test scenario revealed inconsistent results based on operation order:
>Scenario 1:
>	#cd /sys/fs/cgroup/
>	#mkdir A1
>	#mkdir B1
>	#echo 1-2 > B1/cpuset.cpus
>	#echo 0-1 > A1/cpuset.cpus
>	#echo root > A1/cpuset.cpus.partition
>	#cat A1/cpuset.cpus.partition
>	root invalid (Cpu list in cpuset.cpus not exclusive)
>
>Scenario 2:
>	#cd /sys/fs/cgroup/
>	#mkdir A1
>	#mkdir B1
>	#echo 1-2 > B1/cpuset.cpus
>	#echo root > A1/cpuset.cpus.partition
>	#echo 0-1 > A1/cpuset.cpus
>	#cat A1/cpuset.cpus.partition
>	root
>
>The second scenario produces an unexpected result: A1 should be marked
>as invalid but is incorrectly recognized as valid. This occurs because
>when validate_change is invoked, A1 (in root-invalid state) may
>automatically transition to a valid partition, with non-exclusive state
>checks against siblings, leading to incorrect validation.
>
>To fix this inconsistency, treat trialcs in root-invalid state as exclusive
>during validation and set the corresponding exclusive flags, ensuring
>consistent behavior regardless of operation order.
>
>Signed-off-by: Chen Ridong <chenridong@huawei.com>
>---
> kernel/cgroup/cpuset.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
>diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>index daf813386260..a189f356b5f1 100644
>--- a/kernel/cgroup/cpuset.c
>+++ b/kernel/cgroup/cpuset.c
>@@ -2526,6 +2526,18 @@ static void partition_cpus_change(struct cpuset *cs, struct cpuset *trialcs,
> 	}
> }
> 
>+static int init_trialcs(struct cpuset *cs, struct cpuset *trialcs)
>+{
>+	trialcs->prs_err = PERR_NONE;
>+	/*
>+	 * If partition_root_state != 0, it may automatically change to a partition,
>+	 * Therefore, we should treat trialcs as exclusive during validation
>+	 */
>+	if (trialcs->partition_root_state)
>+		set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
>+	return compute_trialcs_excpus(trialcs, cs);
>+}
>+
> /**
>  * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
>  * @cs: the cpuset to consider
>@@ -2551,9 +2563,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> 	if (alloc_tmpmasks(&tmp))
> 		return -ENOMEM;
> 
>-	compute_trialcs_excpus(trialcs, cs);
>-	trialcs->prs_err = PERR_NONE;
>-
>+	init_trialcs(cs, trialcs);
> 	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
> 	if (retval < 0)
> 		goto out_free;
>@@ -2612,7 +2622,7 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> 	 * Reject the change if there is exclusive CPUs conflict with
> 	 * the siblings.
> 	 */
>-	if (compute_trialcs_excpus(trialcs, cs))
>+	if (init_trialcs(cs, trialcs))
> 		return -EINVAL;
> 
> 	/*
>@@ -2628,7 +2638,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> 	if (alloc_tmpmasks(&tmp))
> 		return -ENOMEM;
> 
>-	trialcs->prs_err = PERR_NONE;
> 	partition_cpus_change(cs, trialcs, &tmp);
> 
> 	spin_lock_irq(&callback_lock);

Hi, Ridong,

Maybe, this patch does not apply to the following cases:
 Step
 #1> echo "root" > A1/cpuset.cpus.partition
 #1> echo "0-1" > B1/cpuset.cpus
 #2> echo "1-2" > A1/cpuset.cpus.exclusive  -> return error
 It should return success here.

Please consider the following modification.

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 52468d2c178a..b4085438368c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -609,6 +609,9 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
 	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
 		return true;
 
+	if (cpumask_empty(cs1->exclusive_cpus))
+		return cpumask_intersects(cs1->cpus_allowed, cs2->cpus_allowed);
+
 	return false;
 }
 
Thanks,
Sun Shaojie
Re: [PATCH -next] cpuset: treate root invalid trialcs as exclusive
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/17 12:35, Sun Shaojie wrote:
> On 2025/11/15 09:31, Chen Ridong wrote:
>> A test scenario revealed inconsistent results based on operation order:
>> Scenario 1:
>> 	#cd /sys/fs/cgroup/
>> 	#mkdir A1
>> 	#mkdir B1
>> 	#echo 1-2 > B1/cpuset.cpus
>> 	#echo 0-1 > A1/cpuset.cpus
>> 	#echo root > A1/cpuset.cpus.partition
>> 	#cat A1/cpuset.cpus.partition
>> 	root invalid (Cpu list in cpuset.cpus not exclusive)
>>
>> Scenario 2:
>> 	#cd /sys/fs/cgroup/
>> 	#mkdir A1
>> 	#mkdir B1
>> 	#echo 1-2 > B1/cpuset.cpus
>> 	#echo root > A1/cpuset.cpus.partition
>> 	#echo 0-1 > A1/cpuset.cpus
>> 	#cat A1/cpuset.cpus.partition
>> 	root
>>
>> The second scenario produces an unexpected result: A1 should be marked
>> as invalid but is incorrectly recognized as valid. This occurs because
>> when validate_change is invoked, A1 (in root-invalid state) may
>> automatically transition to a valid partition, with non-exclusive state
>> checks against siblings, leading to incorrect validation.
>>
>> To fix this inconsistency, treat trialcs in root-invalid state as exclusive
>> during validation and set the corresponding exclusive flags, ensuring
>> consistent behavior regardless of operation order.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> kernel/cgroup/cpuset.c | 19 ++++++++++++++-----
>> 1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index daf813386260..a189f356b5f1 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -2526,6 +2526,18 @@ static void partition_cpus_change(struct cpuset *cs, struct cpuset *trialcs,
>> 	}
>> }
>>
>> +static int init_trialcs(struct cpuset *cs, struct cpuset *trialcs)
>> +{
>> +	trialcs->prs_err = PERR_NONE;
>> +	/*
>> +	 * If partition_root_state != 0, it may automatically change to a partition,
>> +	 * Therefore, we should treat trialcs as exclusive during validation
>> +	 */
>> +	if (trialcs->partition_root_state)
>> +		set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
>> +	return compute_trialcs_excpus(trialcs, cs);
>> +}
>> +
>> /**
>>  * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
>>  * @cs: the cpuset to consider
>> @@ -2551,9 +2563,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>> 	if (alloc_tmpmasks(&tmp))
>> 		return -ENOMEM;
>>
>> -	compute_trialcs_excpus(trialcs, cs);
>> -	trialcs->prs_err = PERR_NONE;
>> -
>> +	init_trialcs(cs, trialcs);
>> 	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
>> 	if (retval < 0)
>> 		goto out_free;
>> @@ -2612,7 +2622,7 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>> 	 * Reject the change if there is exclusive CPUs conflict with
>> 	 * the siblings.
>> 	 */
>> -	if (compute_trialcs_excpus(trialcs, cs))
>> +	if (init_trialcs(cs, trialcs))
>> 		return -EINVAL;
>>
>> 	/*
>> @@ -2628,7 +2638,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>> 	if (alloc_tmpmasks(&tmp))
>> 		return -ENOMEM;
>>
>> -	trialcs->prs_err = PERR_NONE;
>> 	partition_cpus_change(cs, trialcs, &tmp);
>>
>> 	spin_lock_irq(&callback_lock);
> 
> Hi, Ridong,
> 
> Maybe, this patch does not apply to the following cases:
>  Step
>  #1> echo "root" > A1/cpuset.cpus.partition
>  #1> echo "0-1" > B1/cpuset.cpus
>  #2> echo "1-2" > A1/cpuset.cpus.exclusive  -> return error
>  It should return success here.
> 
> Please consider the following modification.
> 

If A1 will automatically change to a valid partition, I think it should return error.

Thanks.

-- 
Best regards,
Ridong
Re: [PATCH -next] cpuset: treate root invalid trialcs as exclusive
Posted by Sun Shaojie 2 months, 3 weeks ago
On 2025/11/15 14:23, Chen Ridong wrote:
>On 2025/11/17 12:35, Sun Shaojie wrote:
>> Hi, Ridong,
>> 
>> Maybe, this patch does not apply to the following cases:
>>  Step
>>  #1> echo "root" > A1/cpuset.cpus.partition
>>  #1> echo "0-1" > B1/cpuset.cpus
>>  #2> echo "1-2" > A1/cpuset.cpus.exclusive  -> return error
>>  It should return success here.
>> 
>> Please consider the following modification.
>> 
>
>If A1 will automatically change to a valid partition, I think it should return error.

Hi, Ridong,

A1 will not automatically change to a valid partition.

Perhaps this example is more intuitive.

For example:

 Before apply this patch:
 #1> echo "0-1" > B1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition -> A1's prstate is "root invalid"
 #3> echo "1-2" > A1/cpuset.cpus.exclusive
 Return success, and A1's prstate is "root invalid"

 After apply this patch:
 #1> echo "0-1" > B1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition -> A1's prstate is "root invalid"
 #3> echo "1-2" > A1/cpuset.cpus.exclusive
 Return error, and A1's prstate is "root invalid"

 It should return success here. Because A1's prstate is "root invalid.
 
For this example, the behavior should remain consistent before and after 
applying the patch. This is because when A1 is in the "root invalid" state,
its behavior is equivalent to that of a "member," meaning A1's 
cpuset.cpus.exclusive and B1's cpuset.cpus are allowed to overlap.

Thanks,
Sun Shaojie
Re: [PATCH -next] cpuset: treate root invalid trialcs as exclusive
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/17 14:53, Sun Shaojie wrote:
> On 2025/11/15 14:23, Chen Ridong wrote:
>> On 2025/11/17 12:35, Sun Shaojie wrote:
>>> Hi, Ridong,
>>>
>>> Maybe, this patch does not apply to the following cases:
>>>  Step
>>>  #1> echo "root" > A1/cpuset.cpus.partition
>>>  #1> echo "0-1" > B1/cpuset.cpus
>>>  #2> echo "1-2" > A1/cpuset.cpus.exclusive  -> return error
>>>  It should return success here.
>>>
>>> Please consider the following modification.
>>>
>>
>> If A1 will automatically change to a valid partition, I think it should return error.
> 
> Hi, Ridong,
> 
> A1 will not automatically change to a valid partition.
> 
> Perhaps this example is more intuitive.
> 
> For example:
> 
>  Before apply this patch:
>  #1> echo "0-1" > B1/cpuset.cpus
>  #2> echo "root" > A1/cpuset.cpus.partition -> A1's prstate is "root invalid"
>  #3> echo "1-2" > A1/cpuset.cpus.exclusive
>  Return success, and A1's prstate is "root invalid"
> 

I did not apply your patch to test my patch. here are the results I obtained:

	# cd /sys/fs/cgroup/
	# mkdir A1
	# mkdir B1
	# echo 0-1 > B1/cpuset.cpus
	# echo root > A1/cpuset.cpus.partition
	# cat A1/cpuset.cpus.partition
	root invalid (cpuset.cpus and cpuset.cpus.exclusive are empty)
	# echo 1-2 > A1/cpuset.cpus
	# cat A1/cpuset.cpus.partition
	root

This differs from the results you provided.

Never mind, let's focus on whether the rule should be relaxed in your patch. Once that's resolved, I
can resubmit my patch. Let's set this patch aside for now.

Thanks.

-- 
Best regards,
Ridong
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Sun Shaojie 2 months, 3 weeks ago
On 2025/11/15 15:41, Chen Ridong wrote:
>> Our product need ensure the following behavior: in cgroup-v2, user 
>> modifications to one cpuset should not affect the partition state of its 
>> sibling cpusets. This is justified and meaningful, as it aligns with the 
>> isolation characteristics of cgroups.
>> 
>
>This is ideal in theory, but I don’t think it’s practical in reality.
>
>> This can be divided into two scenarios:
>> Scenario 1: Only one of A1 and B1 is "root".
>> Scenario 2: Both A1 and B1 are "root".
>> 
>> We plan to implement Scenario 1 first. This is the goal of patch v2.
>> However, patch v2 is flawed because it does not strictly adhere to the 
>> following existing rule.
>> 
>> However, it is worth noting that the current cgroup v2 implementation does 
>> not strictly adhere to the following rule either (which is also an 
>> objective for patch v3 to address).
>> 
>> Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".
>> 
>> Using your example to illustrate.
>>  Step (refer to the steps in the table below)
>>  #1> mkdir -p A1                           
>>  #2> echo "0-1" > A1/cpuset.cpus.exclusive 
>>  #3> echo "root" > A1/cpuset.cpus.partition
>>  #4> mkdir -p B1               
>>  #5> echo "0" > B1/cpuset.cpus 
>> 
>> Table 1: Current result
>>  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>>  #1   | 0      |                |           | member       |              |
>>  #2   | 0      | 0-1            |           | member       |              |
>>  #3   | 0      | 0-1            |           | root         |              |
>>  #4   | 0      | 0-1            |           | root         | member       |
>>  #5   | 0      | 0-1            | 0         | root invalid | member       |
>> 
>
>I think this what we expect.
>
>> Table 2: Expected result
>>  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>>  #1   | 0      |                |           | member       |              |
>>  #2   | 0      | 0-1            |           | member       |              |
>>  #3   | 0      | 0-1            |           | root         |              |
>>  #4   | 0      | 0-1            |           | root         | member       |
>>  #5   | error  | 0-1            |           | root         | member       |
>> 
>
>Step 5 should not return an error. As Longman pointed out, in cgroup-v2, setting cpuset.cpus should
>never fail.

Hi, Ridong,

Thank you for your correction. Will update.

Thanks,
Sunshaojie.
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Sun Shaojie 2 months, 3 weeks ago
On 2025/11/13 22:57, Waiman Long wrote:
>On 11/13/25 8:14 AM, Sun Shaojie wrote:
>> ...
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 52468d2c178a..3240b3ab5998 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -592,8 +592,13 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>>    */
>>   static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>>   {
>> -	/* If either cpuset is exclusive, check if they are mutually exclusive */
>> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>> +	/* If both cpusets are exclusive, check if they are mutually exclusive */
>> +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>> +		return !cpusets_are_exclusive(cs1, cs2);
>> +
>> +	/* In cgroup-v1, if either cpuset is exclusive, check if they are mutually exclusive */
>> +	if (!is_in_v2_mode() &&
>
>You should just use cpuset_v2() here as is_in_v2_mode() checks an 
>additional v1 specific mode that is irrelevant wrt to exclusive bit 
>handling. Also please update the functional comment about difference in 
>v1 vs. v2 behavior.
>
>Note that we may have to update other conflict checking code in cpuset.c 
>to make this new behavior more consistent.
>
>Thanks,
>Longman
>
>> +	    (is_cpu_exclusive(cs1) != is_cpu_exclusive(cs2)))
>>   		return !cpusets_are_exclusive(cs1, cs2);
>>   
>>   	/* Exclusive_cpus cannot intersect */

Thank you for the correction.I will update the patch accordingly.

Tnanks,
Sun Shaojie
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Michal Koutný 2 months, 3 weeks ago
Hello.

On Thu, Nov 13, 2025 at 09:14:34PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> In cgroup v2, a mutual overlap check is required when at least one of two
> cpusets is exclusive. However, this check should be relaxed and limited to
> cases where both cpusets are exclusive.
> 
> The table 1 shows the partition states of A1 and B1 after each step before
> applying this patch.
> 
> Table 1: Before applying the patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> mkdir -p A1                            | member       |              |
>  #2> echo "0-1" > A1/cpuset.cpus            | member       |              |
>  #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>  #4> mkdir -p B1                            | root         | member       |
>  #5> echo "0-3" > B1/cpuset.cpus            | root invalid | member       |
>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
> 
> After step #5, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
> use CPUs 2-3, so it would be more reasonable for A1 to remain as "root."

I remember there was the addition of cgroup_file_notify() for the
cpuset.cpus.partition so that such changes can be watched for.

I may not be seeing whole picture, so I ask -- why would it be "more
reasonable" for A1 to remain root. From this description it looks like
you'd silently convert B1's effective cpus to 2-3 but IIUC the code
change that won't happen but you'd reject the write of "0-3" instead.

Isn't here missing Table 2: After applying the patch? I'm asking because
of the number 1 but also because it'd make the intention clearer
;-), perhaps with a column for cpuset.cpus.effective.

Thanks,
Michal
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/14 1:07, Michal Koutný wrote:
> Hello.
> 
> On Thu, Nov 13, 2025 at 09:14:34PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> In cgroup v2, a mutual overlap check is required when at least one of two
>> cpusets is exclusive. However, this check should be relaxed and limited to
>> cases where both cpusets are exclusive.
>>
>> The table 1 shows the partition states of A1 and B1 after each step before
>> applying this patch.
>>
>> Table 1: Before applying the patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> mkdir -p A1                            | member       |              |
>>  #2> echo "0-1" > A1/cpuset.cpus            | member       |              |
>>  #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>  #4> mkdir -p B1                            | root         | member       |
>>  #5> echo "0-3" > B1/cpuset.cpus            | root invalid | member       |
>>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
>>
>> After step #5, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
>> use CPUs 2-3, so it would be more reasonable for A1 to remain as "root."
> 
> I remember there was the addition of cgroup_file_notify() for the
> cpuset.cpus.partition so that such changes can be watched for.
> 

This behavior is visible to user space, I think.

After further consideration, I still suggest retaining this rule.

If we relax this rule, the following checks should also be relaxed?

	/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
	if (!cpumask_empty(cs1->cpus_allowed) &&
	    cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
		return true;

	if (!cpumask_empty(cs2->cpus_allowed) &&
	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
		return true;


For am example:
  Step                                       | A1's prstate | B1's prstate |
  #1> mkdir -p A1                            | member       |              |
  #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
  #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
  #4> mkdir -p B1                            | root         | member       |
  #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |

Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
invalidated? B1 could also use the parent's effective CPUs, right?

This raises the question: Should we relax the restriction to allow a cpuset's cpus to be a subset of
its siblings' exclusive_cpus, thereby keeping A1 valid? If we do this, users may struggle to
understand what their cpuset.cpus.effective value is (and why it has that value)—contrary to their
expectations.

> I may not be seeing whole picture, so I ask -- why would it be "more
> reasonable" for A1 to remain root. From this description it looks like
> you'd silently convert B1's effective cpus to 2-3 but IIUC the code
> change that won't happen but you'd reject the write of "0-3" instead.
> 
> Isn't here missing Table 2: After applying the patch? I'm asking because
> of the number 1 but also because it'd make the intention clearer
> ;-), perhaps with a column for cpuset.cpus.effective.
> 
> Thanks,
> Michal

-- 
Best regards,
Ridong

Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Michal Koutný 2 months, 3 weeks ago
On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
> After further consideration, I still suggest retaining this rule.

Apologies, I'm slightly lost which rule. I hope the new iteration from
Shaojie with both before/after tables will explain it.

> For am example:
>   Step                                       | A1's prstate | B1's prstate |
>   #1> mkdir -p A1                            | member       |              |
>   #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
>   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>   #4> mkdir -p B1                            | root         | member       |
>   #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |
> 
> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
> invalidated?

A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
anymore.

> B1 could also use the parent's effective CPUs, right?

Here you assume some ordering between siblings treating A1 more
important than B1. But it's symmetrical in principle, no?

> This raises the question: Should we relax the restriction to allow a cpuset's cpus to be a subset of
> its siblings' exclusive_cpus, thereby keeping A1 valid? If we do this, users may struggle to
> understand what their cpuset.cpus.effective value is (and why it has that value)—contrary to their
> expectations.

Not only users, not only users. I think struggle is reduced when
the resulting state (valid/invalid, effective) doesn't depend on the
order in which individual cgroups are configured.

0.02€,
Michal
Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/15 0:14, Michal Koutný wrote:
> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> After further consideration, I still suggest retaining this rule.
> 
> Apologies, I'm slightly lost which rule. I hope the new iteration from
> Shaojie with both before/after tables will explain it.
> 

The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
exclusive" to
"If both cpusets are exclusive, check if they are mutually exclusive"

  -	/* If either cpuset is exclusive, check if they are mutually exclusive */
  -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
  +	/* If both cpusets are exclusive, check if they are mutually exclusive */
  +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
  +		return !cpusets_are_exclusive(cs1, cs2);

I suggest not modifying this rule and keeping the original logic intact:

>> For am example:
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> mkdir -p A1                            | member       |              |
>>   #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
>>   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>   #4> mkdir -p B1                            | root         | member       |
>>   #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>
>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>> invalidated?
> 
> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
> anymore.
> 
>> B1 could also use the parent's effective CPUs, right?
> 
> Here you assume some ordering between siblings treating A1 more
> important than B1. But it's symmetrical in principle, no?
> 

I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
following the same logic—but I’m not in favor of doing so.

>> This raises the question: Should we relax the restriction to allow a cpuset's cpus to be a subset of
>> its siblings' exclusive_cpus, thereby keeping A1 valid? If we do this, users may struggle to
>> understand what their cpuset.cpus.effective value is (and why it has that value)—contrary to their
>> expectations.
> 
> Not only users, not only users. I think struggle is reduced when
> the resulting state (valid/invalid, effective) doesn't depend on the
> order in which individual cgroups are configured.
> 
> 0.02€,
> Michal

-- 
Best regards,
Ridong

Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Posted by Waiman Long 2 months, 3 weeks ago
On 11/13/25 8:14 AM, Sun Shaojie wrote:
> In cgroup v2, a mutual overlap check is required when at least one of two
> cpusets is exclusive. However, this check should be relaxed and limited to
> cases where both cpusets are exclusive.
>
> The table 1 shows the partition states of A1 and B1 after each step before
> applying this patch.
>
> Table 1: Before applying the patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> mkdir -p A1                            | member       |              |
>   #2> echo "0-1" > A1/cpuset.cpus            | member       |              |
>   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>   #4> mkdir -p B1                            | root         | member       |
>   #5> echo "0-3" > B1/cpuset.cpus            | root invalid | member       |
>   #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
>
> After step #5, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
> use CPUs 2-3, so it would be more reasonable for A1 to remain as "root."
>
> This patch relaxes the exclusive cpuset check for cgroup v2 while
> preserving the current cgroup v1 behavior.
>
> Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
>
> ---
> v1 -> v2:
>    - Keeps the current cgroup v1 behavior unchanged
>    - Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat.com
> ---
>   kernel/cgroup/cpuset.c                            |  9 +++++++--
>   tools/testing/selftests/cgroup/test_cpuset_prs.sh | 10 +++++-----
>   2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 52468d2c178a..3240b3ab5998 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -592,8 +592,13 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>    */
>   static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>   {
> -	/* If either cpuset is exclusive, check if they are mutually exclusive */
> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> +	/* If both cpusets are exclusive, check if they are mutually exclusive */
> +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
> +		return !cpusets_are_exclusive(cs1, cs2);
> +
> +	/* In cgroup-v1, if either cpuset is exclusive, check if they are mutually exclusive */
> +	if (!is_in_v2_mode() &&

You should just use cpuset_v2() here as is_in_v2_mode() checks an 
additional v1 specific mode that is irrelevant wrt to exclusive bit 
handling. Also please update the functional comment about difference in 
v1 vs. v2 behavior.

Note that we may have to update other conflict checking code in cpuset.c 
to make this new behavior more consistent.

Thanks,
Longman

> +	    (is_cpu_exclusive(cs1) != is_cpu_exclusive(cs2)))
>   		return !cpusets_are_exclusive(cs1, cs2);
>   
>   	/* Exclusive_cpus cannot intersect */
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index a17256d9f88a..903dddfe88d7 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -269,7 +269,7 @@ TEST_MATRIX=(
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X3:P2    .      .     0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3  X2-3:P2   .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:C3 .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> -	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
> +	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2 2-3"
>   	" C0-3:S+ C1-3:S+ C2-3   C4-5     .      .      .      P2    0 B1:4-5 B1:P2 4-5"
>   	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3  X2-3:P2   P2    0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
>   	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3 X2-3:P2:C1-3 P2  0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> @@ -318,7 +318,7 @@ TEST_MATRIX=(
>   	# Invalid to valid local partition direct transition tests
>   	" C1-3:S+:P2 X4:P2  .      .      .      .      .      .     0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
>   	" C1-3:S+:P2 X4:P2  .      .      .    X3:P2    .      .     0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
> -	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
> +	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:5-6 A1:P2|B1:P0 0-4"
>   	"  C0-3:P2   .      .    C4-6 C0-4:C0-3  .      .      .     0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
>   
>   	# Local partition invalidation tests
> @@ -388,10 +388,10 @@ TEST_MATRIX=(
>   	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
>   	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
>   
> -	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
> -	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
> +	# A non-exclusive cpuset.cpus change will not invalidate partition and its siblings
> +	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:3 A1:P1|B1:P0"
>   	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
> -	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
> +	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
>   
>   	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
>   	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"