[PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Sun Shaojie posted 1 patch 1 week, 5 days ago
There is a newer version of this series
kernel/cgroup/cpuset.c                        | 19 +------------------
.../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
2 files changed, 5 insertions(+), 21 deletions(-)
[PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week, 5 days ago
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with its sibling partition, the sibling's partition state becomes invalid.
However, this invalidation is often unnecessary. If the cpuset being
modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases:

Assume the machine has 4 CPUs (0-3).

   root cgroup
      /    \
    A1      B1

Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

 Table 1.1: Before applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |

After step #3, A1 changes from "root" to "root invalid" because its CPUs
(0-1) overlap with those requested by B1 (0). However, B1 can actually
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
remain as "root."

 Table 1.2: After applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "0" > B1/cpuset.cpus              | root         | member       |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

 Table 2.1: Before applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "2" > B1/cpuset.cpus              | root         | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
 #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
regardless of what conflicting value B1 writes to cpuset.cpus, it will
always have at least CPU 2 available. This makes it unnecessary to mark
A1 as "root invalid".

 Table 2.2: After applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "2" > B1/cpuset.cpus              | root         | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
 #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will
always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
is no need to change A1 from "root" to "root invalid".

All other cases remain unaffected. For example, cgroup-v1.

Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
---
 kernel/cgroup/cpuset.c                        | 19 +------------------
 .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 52468d2c178a..f6a834335ebf 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
 					struct tmpmasks *tmp)
 {
 	int retval;
-	struct cpuset *parent = parent_cs(cs);
 
 	retval = validate_change(cs, trialcs);
 
 	if ((retval == -EINVAL) && cpuset_v2()) {
-		struct cgroup_subsys_state *css;
-		struct cpuset *cp;
-
 		/*
 		 * The -EINVAL error code indicates that partition sibling
 		 * CPU exclusivity rule has been violated. We still allow
 		 * the cpumask change to proceed while invalidating the
-		 * partition. However, any conflicting sibling partitions
-		 * have to be marked as invalid too.
+		 * partition.
 		 */
 		trialcs->prs_err = PERR_NOTEXCL;
-		rcu_read_lock();
-		cpuset_for_each_child(cp, css, parent) {
-			struct cpumask *xcpus = user_xcpus(trialcs);
-
-			if (is_partition_valid(cp) &&
-			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
-				rcu_read_unlock();
-				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
-				rcu_read_lock();
-			}
-		}
-		rcu_read_unlock();
 		retval = 0;
 	}
 	return retval;
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a17256d9f88a..7d8941f65d84 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -388,10 +388,11 @@ TEST_MATRIX=(
 	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
 	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
 
-	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
+	# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
+	# An exclusive cpuset.cpus change will invalidate itself.
 	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
-	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
-	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
+	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P-1|B1:P1"
+	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
 
 	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
 	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"
-- 
2.25.1

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Waiman Long 1 week ago
On 11/19/25 5:57 AM, Sun Shaojie wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary. If the cpuset being
> modified is exclusive, it should invalidate itself upon conflict.
>
> This patch applies only to the following two cases:
>
> Assume the machine has 4 CPUs (0-3).
>
>     root cgroup
>        /    \
>      A1      B1
>
> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>
>   Table 1.1: Before applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>
> After step #3, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0). However, B1 can actually
> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
> remain as "root."
>
>   Table 1.2: After applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>
> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
>
>   Table 2.1: Before applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>   #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>
> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
> regardless of what conflicting value B1 writes to cpuset.cpus, it will
> always have at least CPU 2 available. This makes it unnecessary to mark
> A1 as "root invalid".
>
>   Table 2.2: After applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>   #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>
> In summary, regardless of how B1 configures its cpuset.cpus, there will
> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
> is no need to change A1 from "root" to "root invalid".
>
> All other cases remain unaffected. For example, cgroup-v1.

This patch is relatively simple. As others have pointed out, there are 
inconsistency depending on the operation ordering.

In the example above, the final configuration is A1:0-1 & B1:1-2. As the 
cpu lists overlap, we can't have both of them as valid partition roots. 
So either one of A1 or B1 is valid or they are both invalid. The current 
code makes them both invalid no matter the operation ordering.  This 
patch will make one of them valid given the operation ordering above. To 
minimize partition invalidation, we will have to live with the fact that 
it will be first-come first-serve as noted by Michal. I am not against 
this, we just have to document it. However, the following operation 
order will still make both of them invalid:

# echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > 
B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > 
B1/cpuset.cpus.partition

To follow the "first-come first-serve" rule, A1 should be valid and B1 
invalid. That is the inconsistency with your current patch. To fix that, 
we still need to relax the overlap checking rule similar to your v4 patch.

Cheers,
Longman

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 5 days, 11 hours ago
On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@redhat.com> wrote:
> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
> lists overlap, we can't have both of them as valid partition roots. So
> either one of A1 or B1 is valid or they are both invalid. The current code
> makes them both invalid no matter the operation ordering.  This patch will
> make one of them valid given the operation ordering above. To minimize
> partition invalidation, we will have to live with the fact that it will be
> first-come first-serve as noted by Michal. I am not against this, we just
> have to document it. However, the following operation order will still make
> both of them invalid:

I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
race conditions in practice.
BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
in this regard?

Thanks,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Waiman Long 5 days, 6 hours ago
On 11/26/25 9:13 AM, Michal Koutný wrote:
> On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@redhat.com> wrote:
>> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
>> lists overlap, we can't have both of them as valid partition roots. So
>> either one of A1 or B1 is valid or they are both invalid. The current code
>> makes them both invalid no matter the operation ordering.  This patch will
>> make one of them valid given the operation ordering above. To minimize
>> partition invalidation, we will have to live with the fact that it will be
>> first-come first-serve as noted by Michal. I am not against this, we just
>> have to document it. However, the following operation order will still make
>> both of them invalid:
> I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
> race conditions in practice.
> BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
> in this regard?

Modification to cpumasks are all serialized by the cpuset_mutex. If you 
are referring to 2 or more tasks doing parallel updates to various 
cpuset control files of sibling cpusets, the results can actually vary 
depending on the actual serialization results of those operations.

One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact 
that operations on cpuset.cpus.exclusive can fail if the result is not 
exclusive WRT sibling cpusets, but becoming a valid partition is 
guaranteed unless none of the exclusive CPUs are passed down from the 
parent. The use of cpuset.cpus.exclusive is required for creating remote 
partition.

OTOH, changes to cpuset.cpus will never fail, but becoming a valid 
partition root is not guaranteed and is limited to the creation of local 
partition only.

Does that answer your question?

Cheers,
Longman

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 5 days ago

On 2025/11/27 3:43, Waiman Long wrote:
> On 11/26/25 9:13 AM, Michal Koutný wrote:
>> On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@redhat.com> wrote:
>>> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
>>> lists overlap, we can't have both of them as valid partition roots. So
>>> either one of A1 or B1 is valid or they are both invalid. The current code
>>> makes them both invalid no matter the operation ordering.  This patch will

I have to admit that I prefer the current implementation.

At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
make it more difficult for users to understand why the cpuset.cpus they configured do not match the
effective CPUs in use, and why different operation orders yield different results.

In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
created under A1 will end up with empty effective CPUs—and this is not a desired behavior.

   root cgroup
        |
       A1
      /  \
    A2    A3...

 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 mkdir A4
 mkdir A5
 echo "0" > A4/cpuset.cpus
 echo $$ > A4/cgroup.procs
 echo "1" > A5/cpuset.cpus
 echo $$ > A5/cgroup.procs


[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its
requirement?" --Michal.

>>> make one of them valid given the operation ordering above. To minimize
>>> partition invalidation, we will have to live with the fact that it will be
>>> first-come first-serve as noted by Michal. I am not against this, we just
>>> have to document it. However, the following operation order will still make
>>> both of them invalid:
>> I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
>> race conditions in practice.
>> BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
>> in this regard?
> 
> Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more
> tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can
> actually vary depending on the actual serialization results of those operations.
> 
> One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on
> cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a
> valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The
> use of cpuset.cpus.exclusive is required for creating remote partition.
> 
> OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed
> and is limited to the creation of local partition only.
> 
> Does that answer your question?
> 
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 16 hours ago
Hi, Ridong,

On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:
>I have to admit that I prefer the current implementation.
>
>At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
>make it more difficult for users to understand why the cpuset.cpus they configured do not match the
>effective CPUs in use, and why different operation orders yield different results.

As for "different operation orders yield different results", Below is an
example that is not a corner case.

    root cgroup
      /    \
     A1    B1

 #1> echo "0" > A1/cpuset.cpus
 #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error

 #1> echo "0-1" > B1/cpuset.cpus.exclusive
 #2> echo "0" > A1/cpuset.cpus

>
>In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
>created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
>
>   root cgroup
>        |
>       A1
>      /  \
>    A2    A3...
>
> #1> echo "0-1" > A1/cpuset.cpus
> #2> echo "root" > A1/cpuset.cpus.partition
> #3> echo "0-1" > A2/cpuset.cpus
> #4> echo "root" > A2/cpuset.cpus.partition
> mkdir A4
> mkdir A5
> echo "0" > A4/cpuset.cpus
> echo $$ > A4/cgroup.procs
> echo "1" > A5/cpuset.cpus
> echo $$ > A5/cgroup.procs
>

If A2...A5 all belong to the same user, and that user wants both A4 and A5 
to have effective CPUs, then the user should also understand that A2 needs
to be adjusted to "member" instead of "root".

if A2...A5 belong to different users, must satisfying user A4’s requirement
come at the expense of user A2’s requirement? That is not fair.

>
>[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its
>requirement?" --Michal.

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 5 days, 13 hours ago
Hi, Longman,

On Mon, 24 Nov 2025 17:30:47, Waiman Long wrote:
>On 11/19/25 5:57 AM, Sun Shaojie wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with its sibling partition, the sibling's partition state becomes invalid.
>> However, this invalidation is often unnecessary. If the cpuset being
>> modified is exclusive, it should invalidate itself upon conflict.
>>
>> This patch applies only to the following two cases:
>>
>> Assume the machine has 4 CPUs (0-3).
>>
>>     root cgroup
>>        /    \
>>      A1      B1
>>
>> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>>
>>   Table 1.1: Before applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>
>> After step #3, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0). However, B1 can actually
>> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
>> remain as "root."
>>
>>   Table 1.2: After applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>>
>> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
>>
>>   Table 2.1: Before applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>   #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>
>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>> always have at least CPU 2 available. This makes it unnecessary to mark
>> A1 as "root invalid".
>>
>>   Table 2.2: After applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>   #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>
>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>> is no need to change A1 from "root" to "root invalid".
>>
>> All other cases remain unaffected. For example, cgroup-v1.
>
>This patch is relatively simple. As others have pointed out, there are 
>inconsistency depending on the operation ordering.
>
>In the example above, the final configuration is A1:0-1 & B1:1-2. As the 
>cpu lists overlap, we can't have both of them as valid partition roots. 
>So either one of A1 or B1 is valid or they are both invalid. The current 
>code makes them both invalid no matter the operation ordering.  This 
>patch will make one of them valid given the operation ordering above. To 
>minimize partition invalidation, we will have to live with the fact that 
>it will be first-come first-serve as noted by Michal. I am not against 
>this, we just have to document it. However, the following operation 
>order will still make both of them invalid:
>
># echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > 
>B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > 
>B1/cpuset.cpus.partition
>
>To follow the "first-come first-serve" rule, A1 should be valid and B1 
>invalid. That is the inconsistency with your current patch. To fix that, 
>we still need to relax the overlap checking rule similar to your v4 patch.

Thank you for your suggestion! Will update.

Thanks,
Sun Shaojie
[PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 16 hours ago
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with its sibling partition, the sibling's partition state becomes invalid.
However, this invalidation is often unnecessary.

For example: On a machine with 128 CPUs, there are m (m < 128) cpusets
under the root cgroup. Each cpuset is used by a single user(user-1 use
A1, ... , user-m use Am), and the partition states of these cpusets are
configured as follows:

                           root cgroup
        /             /                  \                 \
       A1            A2        ...       An                Am
     (root)        (root)      ...     (root) (root/root invalid/member)

Assume that A1 through Am have not set cpuset.cpus.exclusive. When
user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition
states from A1 to An to change from root to root invalid, as shown
below.

                           root cgroup
        /              /                 \                 \
       A1             A2       ...       An                Am
 (root invalid) (root invalid) ... (root invalid) (root invalid/member)

This outcome is entirely undeserved for all users from A1 to An.

This patch prevents such outcomes by ensuring that modifications to
cpuset.cpus do not affect the partition state of other sibling cpusets.
Therefore, with this patch applied, when user-m configures Am's
cpuset.cpus to "0-127", the result will be as follows.

                           root cgroup
        /             /                  \                 \
       A1            A2        ...       An                Am
     (root)        (root)      ...     (root)     (root invalid/member)

It is worth noting that, since this patch enforces the exclusivity of
sibling cpusets, setting exclusivity now follows a "first-come,
first-served" principle.

For example, consider the following four steps: before applying this
patch, regardless of the order in which they are executed, the final
partition state of both A1 and B1 would always be "root invalid."

 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |

After applying this patch, the first party to set "root" will maintain
its exclusive validity. As follows:

 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "1-2" > B1/cpuset.cpus            | root         | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root         | root invalid |

 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > B1/cpuset.cpus            | member       | member       |
 #2> echo "root" > B1/cpuset.cpus.partition | member       | root         |
 #3> echo "1-2" > A1/cpuset.cpus            | member       | root         |
 #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root         |

In summary, if the current cpuset conflicts with its sibling cpusets on
exclusive CPUs (If a cpuset is exclusive and its exclusive CPUs are empty,
its allowed CPUs will be treated as exclusive CPUs), only the current
cpuset should bear the consequences.

Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
---
 kernel/cgroup/cpuset-internal.h               |  3 +
 kernel/cgroup/cpuset-v1.c                     | 19 ++++++
 kernel/cgroup/cpuset.c                        | 60 ++++++++++++-------
 .../selftests/cgroup/test_cpuset_prs.sh       | 12 ++--
 4 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 337608f408ce..c53111998432 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
 			    bool cpus_updated, bool mems_updated);
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
 #else
 static inline void fmeter_init(struct fmeter *fmp) {}
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
@@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    bool cpus_updated, bool mems_updated) {}
 static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
+static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
+				struct cpuset *cs2) {return false; }
 #endif /* CONFIG_CPUSETS_V1 */
 
 #endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 12e76774c75b..5aa0ac092ef6 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
 	return ret;
 }
 
+/*
+ * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
+ *                                to legacy (v1)
+ * @cs1: first cpuset to check
+ * @cs2: second cpuset to check
+ *
+ * Returns: true if CPU exclusivity conflict exists, false otherwise
+ *
+ * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
+ */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
+{
+	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
+		return cpumask_intersects(cs1->cpus_allowed,
+					  cs2->cpus_allowed);
+
+	return false;
+}
+
 #ifdef CONFIG_PROC_PID_CPUSET
 /*
  * proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 52468d2c178a..e58dd26e074a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -586,14 +586,24 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
  * Returns: true if CPU exclusivity conflict exists, false otherwise
  *
  * Conflict detection rules:
- * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
+ * For cgroup-v1:
+ *     see cpuset1_cpus_excl_conflict()
+ * For cgroup-v2:
+ * 1. If both cs1 and cs2 are exclusive, cs1 and cs2 must be mutually exclusive
  * 2. exclusive_cpus masks cannot intersect between cpusets
  * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
+ * 4. If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs
+ *    will be treated as exclusive CPUs; therefore, its allowed CPUs must not
+ *    intersect with another's exclusive CPUs.
  */
 static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
 {
-	/* If either cpuset is exclusive, check if they are mutually exclusive */
-	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
+	/* For cgroup-v1 */
+	if (!cpuset_v2())
+		return cpuset1_cpus_excl_conflict(cs1, cs2);
+
+	/* If cpusets are exclusive, check if they are mutually exclusive*/
+	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
 		return !cpusets_are_exclusive(cs1, cs2);
 
 	/* Exclusive_cpus cannot intersect */
@@ -609,6 +619,20 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
 	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
 		return true;
 
+	/*
+	 * When a cpuset is exclusive and its exclusive CPUs are empty,
+	 * its cpus_allowed cannot intersect with another cpuset's exclusive_cpus.
+	 */
+	if (is_cpu_exclusive(cs1) &&
+	    cpumask_empty(cs1->exclusive_cpus) &&
+	    cpumask_intersects(cs1->cpus_allowed, cs2->exclusive_cpus))
+		return true;
+
+	if (is_cpu_exclusive(cs2) &&
+	    cpumask_empty(cs2->exclusive_cpus) &&
+	    cpumask_intersects(cs2->cpus_allowed, cs1->exclusive_cpus))
+		return true;
+
 	return false;
 }
 
@@ -2411,34 +2435,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
 					struct tmpmasks *tmp)
 {
 	int retval;
-	struct cpuset *parent = parent_cs(cs);
 
 	retval = validate_change(cs, trialcs);
 
 	if ((retval == -EINVAL) && cpuset_v2()) {
-		struct cgroup_subsys_state *css;
-		struct cpuset *cp;
-
 		/*
 		 * The -EINVAL error code indicates that partition sibling
 		 * CPU exclusivity rule has been violated. We still allow
 		 * the cpumask change to proceed while invalidating the
-		 * partition. However, any conflicting sibling partitions
-		 * have to be marked as invalid too.
+		 * partition.
 		 */
 		trialcs->prs_err = PERR_NOTEXCL;
-		rcu_read_lock();
-		cpuset_for_each_child(cp, css, parent) {
-			struct cpumask *xcpus = user_xcpus(trialcs);
-
-			if (is_partition_valid(cp) &&
-			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
-				rcu_read_unlock();
-				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
-				rcu_read_lock();
-			}
-		}
-		rcu_read_unlock();
 		retval = 0;
 	}
 	return retval;
@@ -2506,8 +2513,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (alloc_tmpmasks(&tmp))
 		return -ENOMEM;
 
-	compute_trialcs_excpus(trialcs, cs);
-	trialcs->prs_err = PERR_NONE;
+	/*
+	 * if there is exclusive CPUs conflict with the siblings,
+	 * we still allow the cpumask change to proceed while
+	 * invalidating the partition.
+	 */
+	if (compute_trialcs_excpus(trialcs, cs))
+		trialcs->prs_err = PERR_NOTEXCL;
+	else
+		trialcs->prs_err = PERR_NONE;
 
 	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
 	if (retval < 0)
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a17256d9f88a..75154e22c702 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -269,7 +269,7 @@ TEST_MATRIX=(
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X3:P2    .      .     0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3  X2-3:P2   .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:C3 .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
-	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
+	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
 	" C0-3:S+ C1-3:S+ C2-3   C4-5     .      .      .      P2    0 B1:4-5 B1:P2 4-5"
 	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3  X2-3:P2   P2    0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
 	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3 X2-3:P2:C1-3 P2  0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
@@ -318,7 +318,7 @@ TEST_MATRIX=(
 	# Invalid to valid local partition direct transition tests
 	" C1-3:S+:P2 X4:P2  .      .      .      .      .      .     0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
 	" C1-3:S+:P2 X4:P2  .      .      .    X3:P2    .      .     0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
-	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
+	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:5-6 A1:P2|B1:P0"
 	"  C0-3:P2   .      .    C4-6 C0-4:C0-3  .      .      .     0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
 
 	# Local partition invalidation tests
@@ -388,10 +388,10 @@ TEST_MATRIX=(
 	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
 	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
 
-	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
-	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
-	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
-	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
+	# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
+	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:3 A1:P1|B1:P0"
+	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P-1|B1:P1"
+	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
 
 	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
 	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"
-- 
2.25.1
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 week, 5 days ago

On 2025/11/19 18:57, Sun Shaojie wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary. If the cpuset being
> modified is exclusive, it should invalidate itself upon conflict.
> 
> This patch applies only to the following two cases:
> 
> Assume the machine has 4 CPUs (0-3).
> 
>    root cgroup
>       /    \
>     A1      B1
> 
> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
> 
>  Table 1.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
> 
> After step #3, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0). However, B1 can actually
> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
> remain as "root."
> 
>  Table 1.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
> 
> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
> 
>  Table 2.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
> 
> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
> regardless of what conflicting value B1 writes to cpuset.cpus, it will
> always have at least CPU 2 available. This makes it unnecessary to mark
> A1 as "root invalid".
> 
>  Table 2.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
> 
> In summary, regardless of how B1 configures its cpuset.cpus, there will
> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
> is no need to change A1 from "root" to "root invalid".
> 
> All other cases remain unaffected. For example, cgroup-v1.
> 
> Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
> ---
>  kernel/cgroup/cpuset.c                        | 19 +------------------
>  .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
>  2 files changed, 5 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 52468d2c178a..f6a834335ebf 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>  					struct tmpmasks *tmp)
>  {
>  	int retval;
> -	struct cpuset *parent = parent_cs(cs);
>  
>  	retval = validate_change(cs, trialcs);
>  
>  	if ((retval == -EINVAL) && cpuset_v2()) {
> -		struct cgroup_subsys_state *css;
> -		struct cpuset *cp;
> -
>  		/*
>  		 * The -EINVAL error code indicates that partition sibling
>  		 * CPU exclusivity rule has been violated. We still allow
>  		 * the cpumask change to proceed while invalidating the
> -		 * partition. However, any conflicting sibling partitions
> -		 * have to be marked as invalid too.
> +		 * partition.
>  		 */
>  		trialcs->prs_err = PERR_NOTEXCL;
> -		rcu_read_lock();
> -		cpuset_for_each_child(cp, css, parent) {
> -			struct cpumask *xcpus = user_xcpus(trialcs);
> -
> -			if (is_partition_valid(cp) &&
> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
> -				rcu_read_unlock();
> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
> -				rcu_read_lock();
> -			}
> -		}
> -		rcu_read_unlock();
>  		retval = 0;
>  	}
>  	return retval;

If we remove this logic, there is a scenario where the parent (a partition) could end up with empty
effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to
disable its siblings' partitions.

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week, 4 days ago
Hi, Ridong,

On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:
>On 2025/11/19 18:57, Sun Shaojie wrote:
>>  kernel/cgroup/cpuset.c                        | 19 +------------------
>>  .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
>>  2 files changed, 5 insertions(+), 21 deletions(-)
>> 
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 52468d2c178a..f6a834335ebf 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>>  					struct tmpmasks *tmp)
>>  {
>>  	int retval;
>> -	struct cpuset *parent = parent_cs(cs);
>>  
>>  	retval = validate_change(cs, trialcs);
>>  
>>  	if ((retval == -EINVAL) && cpuset_v2()) {
>> -		struct cgroup_subsys_state *css;
>> -		struct cpuset *cp;
>> -
>>  		/*
>>  		 * The -EINVAL error code indicates that partition sibling
>>  		 * CPU exclusivity rule has been violated. We still allow
>>  		 * the cpumask change to proceed while invalidating the
>> -		 * partition. However, any conflicting sibling partitions
>> -		 * have to be marked as invalid too.
>> +		 * partition.
>>  		 */
>>  		trialcs->prs_err = PERR_NOTEXCL;
>> -		rcu_read_lock();
>> -		cpuset_for_each_child(cp, css, parent) {
>> -			struct cpumask *xcpus = user_xcpus(trialcs);
>> -
>> -			if (is_partition_valid(cp) &&
>> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
>> -				rcu_read_unlock();
>> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
>> -				rcu_read_lock();
>> -			}
>> -		}
>> -		rcu_read_unlock();
>>  		retval = 0;
>>  	}
>>  	return retval;
>
>If we remove this logic, there is a scenario where the parent (a partition) could end up with empty
>effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to
>disable its siblings' partitions.

I have carefully considered the scenario where parent effective CPUs are 
empty, which corresponds to the following two cases. (After apply this patch).

   root cgroup
        |
       A1
      /  \
    A2    A3

Case 1:
 Step:
 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition

 After step #4, 

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0-1          |              |
 effective_cpus |              | 0-1          |              |
 prstate        | root         | root         | member       |

 After step #4, A3's effective CPUs is empty.

 #5> echo "0-1" > A3/cpuset.cpus

 After step #5,

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0-1          | 0-1          |
 effective_cpus |              | 0-1          |              |
 prstate        | root         | root         | member       |

This patch affects step #5. After step #5, A3's effective CPUs is also empty.
Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus),
it is acceptable for them to remain empty after step #5. Moreover, if A3 is
aware that its parent's effective CPUs are empty, it should understand that
the CPUs it requests may not be granted.

Case 2:
 Step:
 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 #5> echo "1" > A3/cpuset.cpus
 #6> echo "root" > A3/cpuset.cpus.partition

 After step #6,

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0            | 1            |
 effective_cpus |              | 0            | 1            |
 prstate        | root         | root         | root         |

 #7> echo "0-1" > A3/cpuset.cpus

 After step #7,

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0            | 0-1          |
 effective_cpus | 1            | 0            | 1            |
 prstate        | root         | root         | root invalid |

This patch affects step #7. After step #7, A3 only affects itself, changing
from "root" to "root invalid". However, since its effective CPUs remain 1 
both before and after step #7, it doesn't matter even if A2 is not invalidated.

The purpose of this patch is to ensure that modifying cpuset.cpus does not 
disable its siblings' partitions.


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 week, 4 days ago

On 2025/11/20 21:07, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:
>> On 2025/11/19 18:57, Sun Shaojie wrote:
>>>  kernel/cgroup/cpuset.c                        | 19 +------------------
>>>  .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
>>>  2 files changed, 5 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 52468d2c178a..f6a834335ebf 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>>>  					struct tmpmasks *tmp)
>>>  {
>>>  	int retval;
>>> -	struct cpuset *parent = parent_cs(cs);
>>>  
>>>  	retval = validate_change(cs, trialcs);
>>>  
>>>  	if ((retval == -EINVAL) && cpuset_v2()) {
>>> -		struct cgroup_subsys_state *css;
>>> -		struct cpuset *cp;
>>> -
>>>  		/*
>>>  		 * The -EINVAL error code indicates that partition sibling
>>>  		 * CPU exclusivity rule has been violated. We still allow
>>>  		 * the cpumask change to proceed while invalidating the
>>> -		 * partition. However, any conflicting sibling partitions
>>> -		 * have to be marked as invalid too.
>>> +		 * partition.
>>>  		 */
>>>  		trialcs->prs_err = PERR_NOTEXCL;
>>> -		rcu_read_lock();
>>> -		cpuset_for_each_child(cp, css, parent) {
>>> -			struct cpumask *xcpus = user_xcpus(trialcs);
>>> -
>>> -			if (is_partition_valid(cp) &&
>>> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
>>> -				rcu_read_unlock();
>>> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
>>> -				rcu_read_lock();
>>> -			}
>>> -		}
>>> -		rcu_read_unlock();
>>>  		retval = 0;
>>>  	}
>>>  	return retval;
>>
>> If we remove this logic, there is a scenario where the parent (a partition) could end up with empty
>> effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to
>> disable its siblings' partitions.
> 
> I have carefully considered the scenario where parent effective CPUs are 
> empty, which corresponds to the following two cases. (After apply this patch).
> 
>    root cgroup
>         |
>        A1
>       /  \
>     A2    A3
> 
> Case 1:
>  Step:
>  #1> echo "0-1" > A1/cpuset.cpus
>  #2> echo "root" > A1/cpuset.cpus.partition
>  #3> echo "0-1" > A2/cpuset.cpus
>  #4> echo "root" > A2/cpuset.cpus.partition
> 
>  After step #4, 
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0-1          |              |
>  effective_cpus |              | 0-1          |              |
>  prstate        | root         | root         | member       |
> 
>  After step #4, A3's effective CPUs is empty.
> 

That may be a corner case is unexpected.

>  #5> echo "0-1" > A3/cpuset.cpus
> 

If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
A5, ...) afterward. However, prior to your patch, this migration was allowed.

>  After step #5,
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0-1          | 0-1          |
>  effective_cpus |              | 0-1          |              |
>  prstate        | root         | root         | member       |
> 
> This patch affects step #5. After step #5, A3's effective CPUs is also empty.
> Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus),
> it is acceptable for them to remain empty after step #5. Moreover, if A3 is
> aware that its parent's effective CPUs are empty, it should understand that
> the CPUs it requests may not be granted.
> 
> Case 2:
>  Step:
>  #1> echo "0-1" > A1/cpuset.cpus
>  #2> echo "root" > A1/cpuset.cpus.partition
>  #3> echo "0" > A2/cpuset.cpus
>  #4> echo "root" > A2/cpuset.cpus.partition
>  #5> echo "1" > A3/cpuset.cpus
>  #6> echo "root" > A3/cpuset.cpus.partition
> 
>  After step #6,
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0            | 1            |
>  effective_cpus |              | 0            | 1            |
>  prstate        | root         | root         | root         |
> 
>  #7> echo "0-1" > A3/cpuset.cpus
> 
>  After step #7,
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0            | 0-1          |
>  effective_cpus | 1            | 0            | 1            |
>  prstate        | root         | root         | root invalid |
> 
> This patch affects step #7. After step #7, A3 only affects itself, changing
> from "root" to "root invalid". However, since its effective CPUs remain 1 
> both before and after step #7, it doesn't matter even if A2 is not invalidated.
> 
> The purpose of this patch is to ensure that modifying cpuset.cpus does not 
> disable its siblings' partitions.
> 
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week, 3 days ago
Hi, Ridong,

On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>On 2025/11/20 21:07, Sun Shaojie wrote:
>> I have carefully considered the scenario where parent effective CPUs are 
>> empty, which corresponds to the following two cases. (After apply this patch).
>> 
>>    root cgroup
>>         |
>>        A1
>>       /  \
>>     A2    A3
>> 
>> Case 1:
>>  Step:
>>  #1> echo "0-1" > A1/cpuset.cpus
>>  #2> echo "root" > A1/cpuset.cpus.partition
>>  #3> echo "0-1" > A2/cpuset.cpus
>>  #4> echo "root" > A2/cpuset.cpus.partition
>> 
>>  After step #4, 
>> 
>>                 |      A1      |      A2      |      A3      |
>>  cpus_allowed   | 0-1          | 0-1          |              |
>>  effective_cpus |              | 0-1          |              |
>>  prstate        | root         | root         | member       |
>> 
>>  After step #4, A3's effective CPUs is empty.
>> 
>
>That may be a corner case is unexpected.
>
>>  #5> echo "0-1" > A3/cpuset.cpus
>> 
>
>If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>A5, ...) afterward. However, prior to your patch, this migration was allowed.

Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
And what parameters should be configured for A1's cpuset.cpus?
Could you provide a specific example?

Additionally, processes cannot be migrated into a cgroup whose 
cpuset.cpus.effective is empty. However, this patch does not modify this behavior.

So why does applying this patch enable such migration?

>>  After step #5,
>> 
>>                 |      A1      |      A2      |      A3      |
>>  cpus_allowed   | 0-1          | 0-1          | 0-1          |
>>  effective_cpus |              | 0-1          |              |
>>  prstate        | root         | root         | member       |
>> 


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 week, 3 days ago

On 2025/11/21 18:32, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>> I have carefully considered the scenario where parent effective CPUs are 
>>> empty, which corresponds to the following two cases. (After apply this patch).
>>>
>>>    root cgroup
>>>         |
>>>        A1
>>>       /  \
>>>     A2    A3
>>>
>>> Case 1:
>>>  Step:
>>>  #1> echo "0-1" > A1/cpuset.cpus
>>>  #2> echo "root" > A1/cpuset.cpus.partition
>>>  #3> echo "0-1" > A2/cpuset.cpus
>>>  #4> echo "root" > A2/cpuset.cpus.partition
>>>
>>>  After step #4, 
>>>
>>>                 |      A1      |      A2      |      A3      |
>>>  cpus_allowed   | 0-1          | 0-1          |              |
>>>  effective_cpus |              | 0-1          |              |
>>>  prstate        | root         | root         | member       |
>>>
>>>  After step #4, A3's effective CPUs is empty.
>>>
>>
>> That may be a corner case is unexpected.
>>
>>>  #5> echo "0-1" > A3/cpuset.cpus
>>>
>>
>> If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>> for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>> A5, ...) afterward. However, prior to your patch, this migration was allowed.
> 
> Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
> And what parameters should be configured for A1's cpuset.cpus?
> Could you provide a specific example?
> 

 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 mkdir A4
 mkdir A5
 echo "0" > A4/cpuset.cpus
 echo $$ > A4/cgroup.procs
 echo "1" > A5/cpuset.cpus
 echo $$ > A5/cgroup.procs

You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.

However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is
a typical usage scenario.


> Additionally, processes cannot be migrated into a cgroup whose 
> cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
> 
> So why does applying this patch enable such migration?
> 


-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week ago
Hi, Ridong,

On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:
>On 2025/11/21 18:32, Sun Shaojie wrote:
>> Hi, Ridong,
>> 
>> On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>>> I have carefully considered the scenario where parent effective CPUs are 
>>>> empty, which corresponds to the following two cases. (After apply this patch).
>>>>
>>>>    root cgroup
>>>>         |
>>>>        A1
>>>>       /  \
>>>>     A2    A3
>>>>
>>>> Case 1:
>>>>  Step:
>>>>  #1> echo "0-1" > A1/cpuset.cpus
>>>>  #2> echo "root" > A1/cpuset.cpus.partition
>>>>  #3> echo "0-1" > A2/cpuset.cpus
>>>>  #4> echo "root" > A2/cpuset.cpus.partition
>>>>
>>>>  After step #4, 
>>>>
>>>>                 |      A1      |      A2      |      A3      |
>>>>  cpus_allowed   | 0-1          | 0-1          |              |
>>>>  effective_cpus |              | 0-1          |              |
>>>>  prstate        | root         | root         | member       |
>>>>
>>>>  After step #4, A3's effective CPUs is empty.
>>>>
>>>
>>> That may be a corner case is unexpected.
>>>
>>>>  #5> echo "0-1" > A3/cpuset.cpus
>>>>
>>>
>>> If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>>> for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>>> A5, ...) afterward. However, prior to your patch, this migration was allowed.
>> 
>> Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
>> And what parameters should be configured for A1's cpuset.cpus?
>> Could you provide a specific example?
>> 
>
> #1> echo "0-1" > A1/cpuset.cpus
> #2> echo "root" > A1/cpuset.cpus.partition
> #3> echo "0-1" > A2/cpuset.cpus
> #4> echo "root" > A2/cpuset.cpus.partition
> mkdir A4
> mkdir A5
> echo "0" > A4/cpuset.cpus
> echo $$ > A4/cgroup.procs
> echo "1" > A5/cpuset.cpus
> echo $$ > A5/cgroup.procs
>
>You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.
>
>However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is
>a typical usage scenario.
>

I'm sorry, I didn't quite understand the point you were trying to make with this example.

If that's the case

     root cgroup
          |
          A1
       / /  \ \
     A2 A3  A4 A5

 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 mkdir A4
 mkdir A5
 echo "0" > A4/cpuset.cpus
 echo $$ > A4/cgroup.procs  ->This will return an error because A4's effective CPUs are empty.
 echo "1" > A5/cpuset.cpus
 echo $$ > A5/cgroup.procs  ->This will return an error because A5's effective CPUs are empty.

Even with this patch applied, this result will not change.

>
>> Additionally, processes cannot be migrated into a cgroup whose 
>> cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
>> 
>> So why does applying this patch enable such migration?

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 week ago

On 2025/11/24 18:20, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:
>> On 2025/11/21 18:32, Sun Shaojie wrote:
>>> Hi, Ridong,
>>>
>>> On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>>>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>>>> I have carefully considered the scenario where parent effective CPUs are 
>>>>> empty, which corresponds to the following two cases. (After apply this patch).
>>>>>
>>>>>    root cgroup
>>>>>         |
>>>>>        A1
>>>>>       /  \
>>>>>     A2    A3
>>>>>
>>>>> Case 1:
>>>>>  Step:
>>>>>  #1> echo "0-1" > A1/cpuset.cpus
>>>>>  #2> echo "root" > A1/cpuset.cpus.partition
>>>>>  #3> echo "0-1" > A2/cpuset.cpus
>>>>>  #4> echo "root" > A2/cpuset.cpus.partition
>>>>>
>>>>>  After step #4, 
>>>>>
>>>>>                 |      A1      |      A2      |      A3      |
>>>>>  cpus_allowed   | 0-1          | 0-1          |              |
>>>>>  effective_cpus |              | 0-1          |              |
>>>>>  prstate        | root         | root         | member       |
>>>>>
>>>>>  After step #4, A3's effective CPUs is empty.
>>>>>
>>>>
>>>> That may be a corner case is unexpected.
>>>>
>>>>>  #5> echo "0-1" > A3/cpuset.cpus
>>>>>
>>>>
>>>> If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>>>> for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>>>> A5, ...) afterward. However, prior to your patch, this migration was allowed.
>>>
>>> Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
>>> And what parameters should be configured for A1's cpuset.cpus?
>>> Could you provide a specific example?
>>>
>>
>> #1> echo "0-1" > A1/cpuset.cpus
>> #2> echo "root" > A1/cpuset.cpus.partition
>> #3> echo "0-1" > A2/cpuset.cpus
>> #4> echo "root" > A2/cpuset.cpus.partition
>> mkdir A4
>> mkdir A5
>> echo "0" > A4/cpuset.cpus
>> echo $$ > A4/cgroup.procs
>> echo "1" > A5/cpuset.cpus
>> echo $$ > A5/cgroup.procs
>>
>> You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.
>>
>> However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is
>> a typical usage scenario.
>>
> 
> I'm sorry, I didn't quite understand the point you were trying to make with this example.
> 
> If that's the case
> 
>      root cgroup
>           |
>           A1
>        / /  \ \
>      A2 A3  A4 A5
> 
>  #1> echo "0-1" > A1/cpuset.cpus
>  #2> echo "root" > A1/cpuset.cpus.partition
>  #3> echo "0-1" > A2/cpuset.cpus
>  #4> echo "root" > A2/cpuset.cpus.partition
>  mkdir A4
>  mkdir A5
>  echo "0" > A4/cpuset.cpus

If we don't apply your patch, A2 will be invalidated.

>  echo $$ > A4/cgroup.procs  ->This will return an error because A4's effective CPUs are empty.
>  echo "1" > A5/cpuset.cpus
>  echo $$ > A5/cgroup.procs  ->This will return an error because A5's effective CPUs are empty.
> 
> Even with this patch applied, this result will not change.
> 

You can have a try, the result I got:

# mkdir A1
# echo "0-1" > A1/cpuset.cpus
# echo "root" > A1/cpuset.cpus.partition
# cd A1/
# mkdir A2
# mkdir A4
# mkdir A5
# echo "0-1" > A2/cpuset.cpus
# echo "root" > A2/cpuset.cpus.partition
#
# echo "0" > A4/cpuset.cpus
# cat A2/cpuset.cpus
0-1
# cat A2/cpuset.cpus.partition
root invalid
# cat A4/cpuset.cpus.effective
0

>>
>>> Additionally, processes cannot be migrated into a cgroup whose 
>>> cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
>>>
>>> So why does applying this patch enable such migration?
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 5 days, 13 hours ago
Hi, Ridong,

On Mon, 24 Nov 2025 19:33:54, Chen Ridong wrote:
>On 2025/11/24 18:20, Sun Shaojie wrote:
>> I'm sorry, I didn't quite understand the point you were trying to make with this example.
>> 
>> If that's the case
>> 
>>      root cgroup
>>           |
>>           A1
>>        / /  \ \
>>      A2 A3  A4 A5
>> 
>>  #1> echo "0-1" > A1/cpuset.cpus
>>  #2> echo "root" > A1/cpuset.cpus.partition
>>  #3> echo "0-1" > A2/cpuset.cpus
>>  #4> echo "root" > A2/cpuset.cpus.partition
>>  mkdir A4
>>  mkdir A5
>>  echo "0" > A4/cpuset.cpus
>
>If we don't apply your patch, A2 will be invalidated.
>
>>  echo $$ > A4/cgroup.procs  ->This will return an error because A4's effective CPUs are empty.
>>  echo "1" > A5/cpuset.cpus
>>  echo $$ > A5/cgroup.procs  ->This will return an error because A5's effective CPUs are empty.
>> 
>> Even with this patch applied, this result will not change.
>> 
>
>You can have a try, the result I got:
>
># mkdir A1
># echo "0-1" > A1/cpuset.cpus
># echo "root" > A1/cpuset.cpus.partition
># cd A1/
># mkdir A2
># mkdir A4
># mkdir A5
># echo "0-1" > A2/cpuset.cpus
># echo "root" > A2/cpuset.cpus.partition
>#
># echo "0" > A4/cpuset.cpus
># cat A2/cpuset.cpus
>0-1
># cat A2/cpuset.cpus.partition
>root invalid
># cat A4/cpuset.cpus.effective
>0

A4's cpuset.cpus.effective is 0 because A2 changed from root to root invalid. 
However, the purpose of this patch is precisely to keep A2 as "root".

Before 'echo "0" > A4/cpuset.cpus', A4 is aware that its cpuset.cpus.effective
is empty and that its parent's cpuset.cpus.effective is also empty. Therefore,
after executing 'echo "0" > A4/cpuset.cpus', A4 should anticipate the 
possibility that it may not be allocated any available CPUs.

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 1 week, 5 days ago
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary. If the cpuset being
> modified is exclusive, it should invalidate itself upon conflict.
> 
> This patch applies only to the following two cases:
> 
> Assume the machine has 4 CPUs (0-3).
> 
>    root cgroup
>       /    \
>     A1      B1
> 
> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
> 
>  Table 1.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
> 
> After step #3, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0). However, B1 can actually
> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
> remain as "root."
> 
>  Table 1.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
> 
> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

(Thanks for working this out, Shaojie.)

> 
>  Table 2.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
> 
> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
> regardless of what conflicting value B1 writes to cpuset.cpus, it will
> always have at least CPU 2 available. This makes it unnecessary to mark
> A1 as "root invalid".
> 
>  Table 2.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
> 
> In summary, regardless of how B1 configures its cpuset.cpus, there will
> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
> is no need to change A1 from "root" to "root invalid".

Admittedly, I don't like this change because it relies on implicit
preference ordering between siblings (here first comes, first served)
and so the effective config cannot be derived just from the applied
values :-/

Do you actually want to achieve this or is it an implementation
side-effect of the Case 1 scenario that you want to achieve?


Thanks,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week, 4 days ago
Hi, Michal,

On Wed, 19 Nov 2025 14:20:25, Michal Koutný wrote:
>On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with its sibling partition, the sibling's partition state becomes invalid.
>> However, this invalidation is often unnecessary. If the cpuset being
>> modified is exclusive, it should invalidate itself upon conflict.
>> 
>> This patch applies only to the following two cases:
>> 
>> Assume the machine has 4 CPUs (0-3).
>> 
>>    root cgroup
>>       /    \
>>     A1      B1
>> 
>> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>> 
>>  Table 1.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>> 
>> After step #3, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0). However, B1 can actually
>> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
>> remain as "root."
>> 
>>  Table 1.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>> 
>> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
>
>(Thanks for working this out, Shaojie.)
>
>> 
>>  Table 2.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>> 
>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>> always have at least CPU 2 available. This makes it unnecessary to mark
>> A1 as "root invalid".
>> 
>>  Table 2.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>> 
>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>> is no need to change A1 from "root" to "root invalid".
>
>Admittedly, I don't like this change because it relies on implicit
>preference ordering between siblings (here first comes, first served)
>and so the effective config cannot be derived just from the applied
>values :-/
>
>Do you actually want to achieve this or is it an implementation
>side-effect of the Case 1 scenario that you want to achieve?

Yes, this is indeed the functionality I intended to achieve, as I find it 
follows the same logic as Case 1.

However, I didn't fully understand what you meant by "implicit preference 
ordering between siblings (here first comes, first served)."
Could you provide an example?

As for your point that "the effective config cannot be derived just from 
the applied values," even before this patch, we couldn't derive the final 
effective configuration solely from the applied values.

For example, consider the following scenario: (not apply this patch)
Table 1:
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |

Table 2:
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "1-2" > B1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member       |
 #3> echo "0-1" > A1/cpuset.cpus            | root         | member       |

After step #3, both Table 1 and Table 2 have identical value settings, 
yet A1's partition state differs between them.


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 5 days, 11 hours ago
On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> >Do you actually want to achieve this or is it an implementation
> >side-effect of the Case 1 scenario that you want to achieve?
> 
> Yes, this is indeed the functionality I intended to achieve, as I find it 
> follows the same logic as Case 1.

So you want to achieve a stable [1] set of CPUs for a cgroup that cannot
be taken away from you by any sibling, correct?
My reasoning is that the siblings should be under one management entity
and therefore such overcommitment should be avoided already in the
configuration. Invalidating all conflicting siblings is then the most
fair result achievable.
B1 is a second-class partition _only_ because it starts later or why is
it OK to not fulfill its requirement?

[1] Note that A1 should still watch its cpuset.cpus.partition if it
takes exclusivity seriously because its cpus may be taken away by
hot(un)plug or ancestry reconfiguration.
	

> As for your point that "the effective config cannot be derived just from 
> the applied values," even before this patch, we couldn't derive the final 
> effective configuration solely from the applied values.
> 
> For example, consider the following scenario: (not apply this patch)
> Table 1:
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
> 
> Table 2:
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "1-2" > B1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member       |
>  #3> echo "0-1" > A1/cpuset.cpus            | root         | member       |
> 
> After step #3, both Table 1 and Table 2 have identical value settings, 
> yet A1's partition state differs between them.

Aha, I must admit I didn't expect that. IMO, nothing (documented)
prevents the latter (Table 2) behavior (here I'm referring to
cpuset.cpus, not sure about cpuset.cpus.exclusive).
Which of Table 1 or Table do you prefer?

Thanks,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 16 hours ago
Hi, Michal

On Wed, 26 Nov 2025 15:13:13, Michal Koutný wrote:
>On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> >Do you actually want to achieve this or is it an implementation
>> >side-effect of the Case 1 scenario that you want to achieve?
>> 
>> Yes, this is indeed the functionality I intended to achieve, as I find it 
>> follows the same logic as Case 1.
>
>So you want to achieve a stable [1] set of CPUs for a cgroup that cannot
>be taken away from you by any sibling, correct?
>My reasoning is that the siblings should be under one management entity
>and therefore such overcommitment should be avoided already in the
>configuration. Invalidating all conflicting siblings is then the most
>fair result achievable.
>B1 is a second-class partition _only_ because it starts later or why is
>it OK to not fulfill its requirement?
>

If the siblings are under a single management entity, that certainly works.
But what if there are multiple administrative users? Should we really
violate other users' requirements just to satisfy one user's requirement?
Given this, first-come-first-served might be fairer.

>[1] Note that A1 should still watch its cpuset.cpus.partition if it
>takes exclusivity seriously because its cpus may be taken away by
>hot(un)plug or ancestry reconfiguration.

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 5 days ago

On 2025/11/26 22:13, Michal Koutný wrote:
> On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>>> Do you actually want to achieve this or is it an implementation
>>> side-effect of the Case 1 scenario that you want to achieve?
>>
>> Yes, this is indeed the functionality I intended to achieve, as I find it 
>> follows the same logic as Case 1.
> 
> So you want to achieve a stable [1] set of CPUs for a cgroup that cannot
> be taken away from you by any sibling, correct?
> My reasoning is that the siblings should be under one management entity
> and therefore such overcommitment should be avoided already in the
> configuration. Invalidating all conflicting siblings is then the most
> fair result achievable.
> B1 is a second-class partition _only_ because it starts later or why is
> it OK to not fulfill its requirement?
> 
> [1] Note that A1 should still watch its cpuset.cpus.partition if it
> takes exclusivity seriously because its cpus may be taken away by
> hot(un)plug or ancestry reconfiguration.
> 	
> 
>> As for your point that "the effective config cannot be derived just from 
>> the applied values," even before this patch, we couldn't derive the final 
>> effective configuration solely from the applied values.
>>
>> For example, consider the following scenario: (not apply this patch)
>> Table 1:
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
>>
>> Table 2:
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "1-2" > B1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member       |
>>  #3> echo "0-1" > A1/cpuset.cpus            | root         | member       |
>>
>> After step #3, both Table 1 and Table 2 have identical value settings, 
>> yet A1's partition state differs between them.
> 

A corner case should be fixed, and I have sent the patch.

https://lore.kernel.org/cgroups/20251115093140.1121329-1-chenridong@huaweicloud.com/

> Aha, I must admit I didn't expect that. IMO, nothing (documented)
> prevents the latter (Table 2) behavior (here I'm referring to
> cpuset.cpus, not sure about cpuset.cpus.exclusive).
> Which of Table 1 or Table do you prefer?
> 
> Thanks,
> Michal

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 week, 5 days ago

On 2025/11/19 21:20, Michal Koutný wrote:
> On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with its sibling partition, the sibling's partition state becomes invalid.
>> However, this invalidation is often unnecessary. If the cpuset being
>> modified is exclusive, it should invalidate itself upon conflict.
>>
>> This patch applies only to the following two cases:
>>
>> Assume the machine has 4 CPUs (0-3).
>>
>>    root cgroup
>>       /    \
>>     A1      B1
>>
>> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>>
>>  Table 1.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>
>> After step #3, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0). However, B1 can actually
>> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
>> remain as "root."
>>
>>  Table 1.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>>
>> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
> 
> (Thanks for working this out, Shaojie.)
> 
>>
>>  Table 2.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>
>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>> always have at least CPU 2 available. This makes it unnecessary to mark
>> A1 as "root invalid".
>>
>>  Table 2.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>
>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>> is no need to change A1 from "root" to "root invalid".
> 
> Admittedly, I don't like this change because it relies on implicit
> preference ordering between siblings (here first comes, first served)

Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we
may end up with different results depending on the order of operations.

> and so the effective config cannot be derived just from the applied
> values :-/
> 
> Do you actually want to achieve this or is it an implementation
> side-effect of the Case 1 scenario that you want to achieve?
> 
> 
> Thanks,
> Michal

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week, 4 days ago
Hi, Ridong,

On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:
>On 2025/11/19 21:20, Michal Koutný wrote:
>> On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>>>  Table 2.1: Before applying this patch
>>>  Step                                       | A1's prstate | B1's prstate |
>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>>
>>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>>> always have at least CPU 2 available. This makes it unnecessary to mark
>>> A1 as "root invalid".
>>>
>>>  Table 2.2: After applying this patch
>>>  Step                                       | A1's prstate | B1's prstate |
>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>>
>>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>>> is no need to change A1 from "root" to "root invalid".
>> 
>> Admittedly, I don't like this change because it relies on implicit
>> preference ordering between siblings (here first comes, first served)
>
>Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we
>may end up with different results depending on the order of operations.


I don't understand the "order of operations" mentioned here. After reviewing
the previous email content, are you referring to this?

On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>With the result you expect, would we observe the following behaviors:
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "1-2"  > B1/cpuset.cpus
>#5> echo "root" > A1/cpuset.cpus.partition
>#6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "1-2"  > B1/cpuset.cpus
>#5> echo "root" > B1/cpuset.cpus.partition
>#6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>
>Do different operation orders yield different results? If so, this is not what we expect.

However, after applying this patch, the outcomes of these two examples are 
as follows:
 
 #1> mkdir -p A1
 #2> mkdir -p B1
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
 #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
 #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
 #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|

 #1> mkdir -p A1
 #2> mkdir -p B1
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
 #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
 #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
 #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|

Moreover, even without applying this patch, the result remains the same,
because modifying cpuset.cpus.partition does not disable its siblings' partitions.

So, what are the specific issues that you believe would arise?


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 week, 4 days ago

On 2025/11/20 21:07, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:
>> On 2025/11/19 21:20, Michal Koutný wrote:
>>> On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>>>>  Table 2.1: Before applying this patch
>>>>  Step                                       | A1's prstate | B1's prstate |
>>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>>>
>>>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>>>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>>>> always have at least CPU 2 available. This makes it unnecessary to mark
>>>> A1 as "root invalid".
>>>>
>>>>  Table 2.2: After applying this patch
>>>>  Step                                       | A1's prstate | B1's prstate |
>>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>>>
>>>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>>>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>>>> is no need to change A1 from "root" to "root invalid".
>>>
>>> Admittedly, I don't like this change because it relies on implicit
>>> preference ordering between siblings (here first comes, first served)
>>
>> Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we
>> may end up with different results depending on the order of operations.
> 
> 
> I don't understand the "order of operations" mentioned here. After reviewing
> the previous email content, are you referring to this?
> 
> On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>> With the result you expect, would we observe the following behaviors:
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "1-2"  > B1/cpuset.cpus
>> #5> echo "root" > A1/cpuset.cpus.partition
>> #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "1-2"  > B1/cpuset.cpus
>> #5> echo "root" > B1/cpuset.cpus.partition
>> #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>>
>> Do different operation orders yield different results? If so, this is not what we expect.
> 
> However, after applying this patch, the outcomes of these two examples are 
> as follows:
>  
>  #1> mkdir -p A1
>  #2> mkdir -p B1
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>  #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
> 
>  #1> mkdir -p A1
>  #2> mkdir -p B1
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>  #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
>  #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
> 

How about the following two sequences of operations:

#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "root" > A1/cpuset.cpus.partition
#5> echo "1-2"  > B1/cpuset.cpus
#6> echo "root" > B1/cpuset.cpus.partition


#1> mkdir -p A1
#2> mkdir -p B1
#5> echo "1-2"  > B1/cpuset.cpus
#6> echo "root" > B1/cpuset.cpus.partition
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "root" > A1/cpuset.cpus.partition

Will these two sequences yield the same result?

As a key requirement: Regardless of the order in which we apply the configurations, identical final
settings should always result in identical system states. We need to confirm if this holds true here.

> Moreover, even without applying this patch, the result remains the same,
> because modifying cpuset.cpus.partition does not disable its siblings' partitions.
> 
> So, what are the specific issues that you believe would arise?
> 
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week, 3 days ago
Hi, Ridong,

Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:
>On 2025/11/20 21:07, Sun Shaojie wrote:
>> I don't understand the "order of operations" mentioned here. After reviewing
>> the previous email content, are you referring to this?
>> 
>> On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>>> With the result you expect, would we observe the following behaviors:
>>>
>>> #1> mkdir -p A1
>>> #2> mkdir -p B1
>>> #3> echo "0-1"  > A1/cpuset.cpus
>>> #4> echo "1-2"  > B1/cpuset.cpus
>>> #5> echo "root" > A1/cpuset.cpus.partition
>>> #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>>>
>>> #1> mkdir -p A1
>>> #2> mkdir -p B1
>>> #3> echo "0-1"  > A1/cpuset.cpus
>>> #4> echo "1-2"  > B1/cpuset.cpus
>>> #5> echo "root" > B1/cpuset.cpus.partition
>>> #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>>>
>>> Do different operation orders yield different results? If so, this is not what we expect.
>> 
>> However, after applying this patch, the outcomes of these two examples are 
>> as follows:
>>  
>>  #1> mkdir -p A1
>>  #2> mkdir -p B1
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>  #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
>>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
>> 
>>  #1> mkdir -p A1
>>  #2> mkdir -p B1
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>  #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
>>  #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
>> 
>
>How about the following two sequences of operations:
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "root" > A1/cpuset.cpus.partition
>#5> echo "1-2"  > B1/cpuset.cpus
>#6> echo "root" > B1/cpuset.cpus.partition
>
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#5> echo "1-2"  > B1/cpuset.cpus
>#6> echo "root" > B1/cpuset.cpus.partition
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "root" > A1/cpuset.cpus.partition
>
>Will these two sequences yield the same result?

>As a key requirement: Regardless of the order in which we apply the configurations, identical final
>settings should always result in identical system states. We need to confirm if this holds true here.

Is this truly a key requirement? It appears this requirement wasn't met even
before applying my patch.

The example below, which does not use this patch, demonstrates how different
sequences with identical configurations can still lead to different system
states.

 #1> mkdir -p A1
 #2> mkdir -p B1                            | A1's prstate | B1's prstate |
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
 #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
 #5> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #6> echo "1-2"  > B1/cpuset.cpus           | root invalid | member       |
 #7> echo "2-3"  > B1/cpuset.cpus.exclusive | root invalid | member       |
 #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root         |

 #1> mkdir -p A1
 #2> mkdir -p B1                            | A1's prstate | B1's prstate |
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
 #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
 #5> echo "2-3"  > B1/cpuset.cpus.exclusive | member       | member       |
 #6> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #7> echo "1-2"  > B1/cpuset.cpus           | root         | member       |
 #8> echo "root" > B1/cpuset.cpus.partition | root         | root         |

Even without this patch, the result can still differ.


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 week, 3 days ago

On 2025/11/21 18:33, Sun Shaojie wrote:
> Hi, Ridong,
> 
> Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:
>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>> I don't understand the "order of operations" mentioned here. After reviewing
>>> the previous email content, are you referring to this?
>>>
>>> On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>>>> With the result you expect, would we observe the following behaviors:
>>>>
>>>> #1> mkdir -p A1
>>>> #2> mkdir -p B1
>>>> #3> echo "0-1"  > A1/cpuset.cpus
>>>> #4> echo "1-2"  > B1/cpuset.cpus
>>>> #5> echo "root" > A1/cpuset.cpus.partition
>>>> #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>>>>
>>>> #1> mkdir -p A1
>>>> #2> mkdir -p B1
>>>> #3> echo "0-1"  > A1/cpuset.cpus
>>>> #4> echo "1-2"  > B1/cpuset.cpus
>>>> #5> echo "root" > B1/cpuset.cpus.partition
>>>> #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>>>>
>>>> Do different operation orders yield different results? If so, this is not what we expect.
>>>
>>> However, after applying this patch, the outcomes of these two examples are 
>>> as follows:
>>>  
>>>  #1> mkdir -p A1
>>>  #2> mkdir -p B1
>>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>>  #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
>>>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
>>>
>>>  #1> mkdir -p A1
>>>  #2> mkdir -p B1
>>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>>  #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
>>>  #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
>>>
>>
>> How about the following two sequences of operations:
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "root" > A1/cpuset.cpus.partition
>> #5> echo "1-2"  > B1/cpuset.cpus
>> #6> echo "root" > B1/cpuset.cpus.partition
>>
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #5> echo "1-2"  > B1/cpuset.cpus
>> #6> echo "root" > B1/cpuset.cpus.partition
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "root" > A1/cpuset.cpus.partition
>>
>> Will these two sequences yield the same result?
> 
>> As a key requirement: Regardless of the order in which we apply the configurations, identical final
>> settings should always result in identical system states. We need to confirm if this holds true here.
> 
> Is this truly a key requirement? It appears this requirement wasn't met even
> before applying my patch.
> 

I believe it requires, it may some corner cases we should fix.

> The example below, which does not use this patch, demonstrates how different
> sequences with identical configurations can still lead to different system
> states.
> 
>  #1> mkdir -p A1
>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>  #5> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #6> echo "1-2"  > B1/cpuset.cpus           | root invalid | member       |
>  #7> echo "2-3"  > B1/cpuset.cpus.exclusive | root invalid | member       |
>  #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root         |
> 

IIUC, you've created this example with the expectation that both A1 and B1 should serve as root
partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus,
cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.

The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8
is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to
passively and automatically transition A1 to a valid partition state.

>  #1> mkdir -p A1
>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>  #5> echo "2-3"  > B1/cpuset.cpus.exclusive | member       | member       |
>  #6> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #7> echo "1-2"  > B1/cpuset.cpus           | root         | member       |
>  #8> echo "root" > B1/cpuset.cpus.partition | root         | root         |
> 
> Even without this patch, the result can still differ.
> 
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 week ago
Hi, Ridong,

On Sat, 22 Nov 2025 09:19:39, Chen Ridong wrote:
>On 2025/11/21 18:33, Sun Shaojie wrote:
>> Is this truly a key requirement? It appears this requirement wasn't met even
>> before applying my patch.
>> 
>
>I believe it requires, it may some corner cases we should fix.
>
>> The example below, which does not use this patch, demonstrates how different
>> sequences with identical configurations can still lead to different system
>> states.
>> 
>>  #1> mkdir -p A1
>>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>>  #5> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #6> echo "1-2"  > B1/cpuset.cpus           | root invalid | member       |
>>  #7> echo "2-3"  > B1/cpuset.cpus.exclusive | root invalid | member       |
>>  #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root         |
>> 
>
>IIUC, you've created this example with the expectation that both A1 and B1 should serve as root
>partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus,
>cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.
>
>The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8
>is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to
>passively and automatically transition A1 to a valid partition state.
>

So, was the original behavior of invalidating sibling partitions driven by this key requirement?
(As a key requirement: Regardless of the order in which we apply the configurations, identical final
settings should always result in identical system states.)

>>  #1> mkdir -p A1
>>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>>  #5> echo "2-3"  > B1/cpuset.cpus.exclusive | member       | member       |
>>  #6> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #7> echo "1-2"  > B1/cpuset.cpus           | root         | member       |
>>  #8> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>> 
>> Even without this patch, the result can still differ.
>> 

Thanks,
Sun Shaojie