[PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.

Sun Shaojie posted 1 patch 2 months, 3 weeks ago
There is a newer version of this series
kernel/cgroup/cpuset.c                        | 19 +------------------
.../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
2 files changed, 5 insertions(+), 21 deletions(-)
[PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 3 weeks ago
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with its sibling partition, the sibling's partition state becomes invalid.
However, this invalidation is often unnecessary. If the cpuset being
modified is exclusive, it should invalidate itself upon conflict.

This patch applies only to the following two cases:

Assume the machine has 4 CPUs (0-3).

   root cgroup
      /    \
    A1      B1

Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus

 Table 1.1: Before applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |

After step #3, A1 changes from "root" to "root invalid" because its CPUs
(0-1) overlap with those requested by B1 (0). However, B1 can actually
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
remain as "root."

 Table 1.2: After applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "0" > B1/cpuset.cpus              | root         | member       |

Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

 Table 2.1: Before applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "2" > B1/cpuset.cpus              | root         | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
 #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |

After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
regardless of what conflicting value B1 writes to cpuset.cpus, it will
always have at least CPU 2 available. This makes it unnecessary to mark
A1 as "root invalid".

 Table 2.2: After applying this patch
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "2" > B1/cpuset.cpus              | root         | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
 #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |

In summary, regardless of how B1 configures its cpuset.cpus, there will
always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
is no need to change A1 from "root" to "root invalid".

All other cases remain unaffected. For example, cgroup-v1.

Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
---
 kernel/cgroup/cpuset.c                        | 19 +------------------
 .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 52468d2c178a..f6a834335ebf 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
 					struct tmpmasks *tmp)
 {
 	int retval;
-	struct cpuset *parent = parent_cs(cs);
 
 	retval = validate_change(cs, trialcs);
 
 	if ((retval == -EINVAL) && cpuset_v2()) {
-		struct cgroup_subsys_state *css;
-		struct cpuset *cp;
-
 		/*
 		 * The -EINVAL error code indicates that partition sibling
 		 * CPU exclusivity rule has been violated. We still allow
 		 * the cpumask change to proceed while invalidating the
-		 * partition. However, any conflicting sibling partitions
-		 * have to be marked as invalid too.
+		 * partition.
 		 */
 		trialcs->prs_err = PERR_NOTEXCL;
-		rcu_read_lock();
-		cpuset_for_each_child(cp, css, parent) {
-			struct cpumask *xcpus = user_xcpus(trialcs);
-
-			if (is_partition_valid(cp) &&
-			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
-				rcu_read_unlock();
-				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
-				rcu_read_lock();
-			}
-		}
-		rcu_read_unlock();
 		retval = 0;
 	}
 	return retval;
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a17256d9f88a..7d8941f65d84 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -388,10 +388,11 @@ TEST_MATRIX=(
 	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
 	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
 
-	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
+	# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
+	# An exclusive cpuset.cpus change will invalidate itself.
 	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
-	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
-	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
+	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P-1|B1:P1"
+	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
 
 	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
 	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"
-- 
2.25.1

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Waiman Long 2 months, 2 weeks ago
On 11/19/25 5:57 AM, Sun Shaojie wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary. If the cpuset being
> modified is exclusive, it should invalidate itself upon conflict.
>
> This patch applies only to the following two cases:
>
> Assume the machine has 4 CPUs (0-3).
>
>     root cgroup
>        /    \
>      A1      B1
>
> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>
>   Table 1.1: Before applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>
> After step #3, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0). However, B1 can actually
> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
> remain as "root."
>
>   Table 1.2: After applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>
> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
>
>   Table 2.1: Before applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>   #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>
> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
> regardless of what conflicting value B1 writes to cpuset.cpus, it will
> always have at least CPU 2 available. This makes it unnecessary to mark
> A1 as "root invalid".
>
>   Table 2.2: After applying this patch
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>   #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>
> In summary, regardless of how B1 configures its cpuset.cpus, there will
> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
> is no need to change A1 from "root" to "root invalid".
>
> All other cases remain unaffected. For example, cgroup-v1.

This patch is relatively simple. As others have pointed out, there are 
inconsistency depending on the operation ordering.

In the example above, the final configuration is A1:0-1 & B1:1-2. As the 
cpu lists overlap, we can't have both of them as valid partition roots. 
So either one of A1 or B1 is valid or they are both invalid. The current 
code makes them both invalid no matter the operation ordering.  This 
patch will make one of them valid given the operation ordering above. To 
minimize partition invalidation, we will have to live with the fact that 
it will be first-come first-serve as noted by Michal. I am not against 
this, we just have to document it. However, the following operation 
order will still make both of them invalid:

# echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > 
B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > 
B1/cpuset.cpus.partition

To follow the "first-come first-serve" rule, A1 should be valid and B1 
invalid. That is the inconsistency with your current patch. To fix that, 
we still need to relax the overlap checking rule similar to your v4 patch.

Cheers,
Longman

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 2 months, 2 weeks ago
On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@redhat.com> wrote:
> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
> lists overlap, we can't have both of them as valid partition roots. So
> either one of A1 or B1 is valid or they are both invalid. The current code
> makes them both invalid no matter the operation ordering.  This patch will
> make one of them valid given the operation ordering above. To minimize
> partition invalidation, we will have to live with the fact that it will be
> first-come first-serve as noted by Michal. I am not against this, we just
> have to document it. However, the following operation order will still make
> both of them invalid:

I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
race conditions in practice.
BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
in this regard?

Thanks,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Waiman Long 2 months, 2 weeks ago
On 11/26/25 9:13 AM, Michal Koutný wrote:
> On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@redhat.com> wrote:
>> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
>> lists overlap, we can't have both of them as valid partition roots. So
>> either one of A1 or B1 is valid or they are both invalid. The current code
>> makes them both invalid no matter the operation ordering.  This patch will
>> make one of them valid given the operation ordering above. To minimize
>> partition invalidation, we will have to live with the fact that it will be
>> first-come first-serve as noted by Michal. I am not against this, we just
>> have to document it. However, the following operation order will still make
>> both of them invalid:
> I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
> race conditions in practice.
> BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
> in this regard?

Modification to cpumasks are all serialized by the cpuset_mutex. If you 
are referring to 2 or more tasks doing parallel updates to various 
cpuset control files of sibling cpusets, the results can actually vary 
depending on the actual serialization results of those operations.

One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact 
that operations on cpuset.cpus.exclusive can fail if the result is not 
exclusive WRT sibling cpusets, but becoming a valid partition is 
guaranteed unless none of the exclusive CPUs are passed down from the 
parent. The use of cpuset.cpus.exclusive is required for creating remote 
partition.

OTOH, changes to cpuset.cpus will never fail, but becoming a valid 
partition root is not guaranteed and is limited to the creation of local 
partition only.

Does that answer your question?

Cheers,
Longman

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 2 months ago
Hi Waiman.

On Wed, Nov 26, 2025 at 02:43:50PM -0500, Waiman Long <llong@redhat.com> wrote:
> Modification to cpumasks are all serialized by the cpuset_mutex. If you are
> referring to 2 or more tasks doing parallel updates to various cpuset
> control files of sibling cpusets, the results can actually vary depending on
> the actual serialization results of those operations.

I meant the latter when the difference in results when concurrent tasks
do the update (e.g. two containers start in parallel), I don't see an
issue with the race wrt consistency of in-kernel data. We're on the same
page here.

> One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact
> that operations on cpuset.cpus.exclusive can fail if the result is not
> exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed
> unless none of the exclusive CPUs are passed down from the parent. The use
> of cpuset.cpus.exclusive is required for creating remote partition.
> 
> OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition
> root is not guaranteed and is limited to the creation of local partition
> only.
> 
> Does that answer your question?

It does help my understanding. Do you envision that remote and local
partitions should be used together (in one subtree)?

Thanks,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Waiman Long 1 month, 4 weeks ago
On 12/8/25 9:32 AM, Michal Koutný wrote:
> Hi Waiman.
>
> On Wed, Nov 26, 2025 at 02:43:50PM -0500, Waiman Long <llong@redhat.com> wrote:
>> Modification to cpumasks are all serialized by the cpuset_mutex. If you are
>> referring to 2 or more tasks doing parallel updates to various cpuset
>> control files of sibling cpusets, the results can actually vary depending on
>> the actual serialization results of those operations.
> I meant the latter when the difference in results when concurrent tasks
> do the update (e.g. two containers start in parallel), I don't see an
> issue with the race wrt consistency of in-kernel data. We're on the same
> page here.
>
>> One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact
>> that operations on cpuset.cpus.exclusive can fail if the result is not
>> exclusive WRT sibling cpusets, but becoming a valid partition is guaranteed
>> unless none of the exclusive CPUs are passed down from the parent. The use
>> of cpuset.cpus.exclusive is required for creating remote partition.
>>
>> OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition
>> root is not guaranteed and is limited to the creation of local partition
>> only.
>>
>> Does that answer your question?
> It does help my understanding. Do you envision that remote and local
> partitions should be used together (in one subtree)?

It should be rare to have both remote and local partition enabled in the 
same system, though it is not disallowed. The local partition should 
only be used on system that run a small number of applications with one 
or just a few that need partition support. For systems that run a large 
number of containerized applications like a Kubernetes managed system, 
local partition cannot be used because of the way container management 
is being done as the actual cgroups associated with a container can be a 
bit far from the cgroup root. Remote partition was created for such a 
use case where local partition will be used at all.

Cheers,
Longman

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 2 weeks ago

On 2025/11/27 3:43, Waiman Long wrote:
> On 11/26/25 9:13 AM, Michal Koutný wrote:
>> On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@redhat.com> wrote:
>>> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
>>> lists overlap, we can't have both of them as valid partition roots. So
>>> either one of A1 or B1 is valid or they are both invalid. The current code
>>> makes them both invalid no matter the operation ordering.  This patch will

I have to admit that I prefer the current implementation.

At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
make it more difficult for users to understand why the cpuset.cpus they configured do not match the
effective CPUs in use, and why different operation orders yield different results.

In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
created under A1 will end up with empty effective CPUs—and this is not a desired behavior.

   root cgroup
        |
       A1
      /  \
    A2    A3...

 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 mkdir A4
 mkdir A5
 echo "0" > A4/cpuset.cpus
 echo $$ > A4/cgroup.procs
 echo "1" > A5/cpuset.cpus
 echo $$ > A5/cgroup.procs


[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its
requirement?" --Michal.

>>> make one of them valid given the operation ordering above. To minimize
>>> partition invalidation, we will have to live with the fact that it will be
>>> first-come first-serve as noted by Michal. I am not against this, we just
>>> have to document it. However, the following operation order will still make
>>> both of them invalid:
>> I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
>> race conditions in practice.
>> BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
>> in this regard?
> 
> Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more
> tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can
> actually vary depending on the actual serialization results of those operations.
> 
> One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on
> cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a
> valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The
> use of cpuset.cpus.exclusive is required for creating remote partition.
> 
> OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed
> and is limited to the creation of local partition only.
> 
> Does that answer your question?
> 
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 1 week ago
Hi, Ridong,

On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:
>I have to admit that I prefer the current implementation.
>
>At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
>make it more difficult for users to understand why the cpuset.cpus they configured do not match the
>effective CPUs in use, and why different operation orders yield different results.

As for "different operation orders yield different results", Below is an
example that is not a corner case.

    root cgroup
      /    \
     A1    B1

 #1> echo "0" > A1/cpuset.cpus
 #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error

 #1> echo "0-1" > B1/cpuset.cpus.exclusive
 #2> echo "0" > A1/cpuset.cpus

>
>In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
>created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
>
>   root cgroup
>        |
>       A1
>      /  \
>    A2    A3...
>
> #1> echo "0-1" > A1/cpuset.cpus
> #2> echo "root" > A1/cpuset.cpus.partition
> #3> echo "0-1" > A2/cpuset.cpus
> #4> echo "root" > A2/cpuset.cpus.partition
> mkdir A4
> mkdir A5
> echo "0" > A4/cpuset.cpus
> echo $$ > A4/cgroup.procs
> echo "1" > A5/cpuset.cpus
> echo $$ > A5/cgroup.procs
>

If A2...A5 all belong to the same user, and that user wants both A4 and A5 
to have effective CPUs, then the user should also understand that A2 needs
to be adjusted to "member" instead of "root".

if A2...A5 belong to different users, must satisfying user A4’s requirement
come at the expense of user A2’s requirement? That is not fair.

>
>[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its
>requirement?" --Michal.

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 1 month, 4 weeks ago

On 2025/12/1 17:44, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:
>> I have to admit that I prefer the current implementation.
>>
>> At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
>> make it more difficult for users to understand why the cpuset.cpus they configured do not match the
>> effective CPUs in use, and why different operation orders yield different results.
> 
> As for "different operation orders yield different results", Below is an
> example that is not a corner case.
> 
>     root cgroup
>       /    \
>      A1    B1
> 
>  #1> echo "0" > A1/cpuset.cpus
>  #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error
> 
>  #1> echo "0-1" > B1/cpuset.cpus.exclusive
>  #2> echo "0" > A1/cpuset.cpus
> 

You're looking at one rule, but there's another one—Longman pointed out that setting cpuset.cpu
should never fail.

>>
>> In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
>> created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
>>
>>   root cgroup
>>        |
>>       A1
>>      /  \
>>    A2    A3...
>>
>> #1> echo "0-1" > A1/cpuset.cpus
>> #2> echo "root" > A1/cpuset.cpus.partition
>> #3> echo "0-1" > A2/cpuset.cpus
>> #4> echo "root" > A2/cpuset.cpus.partition
>> mkdir A4
>> mkdir A5
>> echo "0" > A4/cpuset.cpus
>> echo $$ > A4/cgroup.procs
>> echo "1" > A5/cpuset.cpus
>> echo $$ > A5/cgroup.procs
>>
> 
> If A2...A5 all belong to the same user, and that user wants both A4 and A5 
> to have effective CPUs, then the user should also understand that A2 needs
> to be adjusted to "member" instead of "root".
> 
> if A2...A5 belong to different users, must satisfying user A4’s requirement
> come at the expense of user A2’s requirement? That is not fair.
> 

Regarding cpuset usage with Docker: when binding CPUs at container startup, do you check the sibling
CPUs in use? Without this check, A2 will not be invalidated.

Your patch has been discussed for a while. It seems to make the rules more complex.

>>
>> [1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its
>> requirement?" --Michal.
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict
Posted by Sun Shaojie 1 month, 3 weeks ago
Hi, Ridong

On Sat, 13 Dec 2025 08:52:11 +0800, Chen Ridong wrote:
>On 2025/12/1 17:44, Sun Shaojie wrote:
>> Hi, Ridong,
>> 
>> On Thu, 27 Nov 2025 09:55:21, Chen Ridong wrote:
>>> I have to admit that I prefer the current implementation.
>>>
>>> At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
>>> make it more difficult for users to understand why the cpuset.cpus they configured do not match the
>>> effective CPUs in use, and why different operation orders yield different results.
>> 
>> As for "different operation orders yield different results", Below is an
>> example that is not a corner case.
>> 
>>     root cgroup
>>       /    \
>>      A1    B1
>> 
>>  #1> echo "0" > A1/cpuset.cpus
>>  #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error
>> 
>>  #1> echo "0-1" > B1/cpuset.cpus.exclusive
>>  #2> echo "0" > A1/cpuset.cpus
>> 
>
>You're looking at one rule, but there's another one—Longman pointed out that setting cpuset.cpu
>should never fail.

Precisely because I know that setting cpuset.cpus should never fail,
I provided this example, which is why it demonstrates that "different
operation orders yield different results."

>>>
>>> In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
>>> created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
>>>
>>>   root cgroup
>>>        |
>>>       A1
>>>      /  \
>>>    A2    A3...
>>>
>>> #1> echo "0-1" > A1/cpuset.cpus
>>> #2> echo "root" > A1/cpuset.cpus.partition
>>> #3> echo "0-1" > A2/cpuset.cpus
>>> #4> echo "root" > A2/cpuset.cpus.partition
>>> mkdir A4
>>> mkdir A5
>>> echo "0" > A4/cpuset.cpus
>>> echo $$ > A4/cgroup.procs
>>> echo "1" > A5/cpuset.cpus
>>> echo $$ > A5/cgroup.procs
>>>
>> 
>> If A2...A5 all belong to the same user, and that user wants both A4 and A5 
>> to have effective CPUs, then the user should also understand that A2 needs
>> to be adjusted to "member" instead of "root".
>> 
>> if A2...A5 belong to different users, must satisfying user A4’s requirement
>> come at the expense of user A2’s requirement? That is not fair.
>> 
>
>Regarding cpuset usage with Docker: when binding CPUs at container startup, do you check the sibling
>CPUs in use? Without this check, A2 will not be invalidated.
>
>Your patch has been discussed for a while. It seems to make the rules more complex.

My aim is to safeguard the independence of sibling nodes while adhering to
existing rules. I continuously update the patch to uphold these rules, as
seen in the recently updated patch v6
(https://lore.kernel.org/cgroups/20251201093806.107157-1-sunshaojie@kylinos.cn/).

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 2 months ago
Hello.

On Mon, Dec 01, 2025 at 05:44:47PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> As for "different operation orders yield different results", Below is an
> example that is not a corner case.
> 
>     root cgroup
>       /    \
>      A1    B1
> 
>  #1> echo "0" > A1/cpuset.cpus
>  #2> echo "0-1" > B1/cpuset.cpus.exclusive --> return error
> 
>  #1> echo "0-1" > B1/cpuset.cpus.exclusive
>  #2> echo "0" > A1/cpuset.cpus

Here it is a combination of remote cs local partitions.
I'd like to treat the two approaches separately and better not consider
their combination.

The idea (and permissions check AFACS) behind remote partitions is to
allow "stealing" CPU ownership so cpuset.cpus.exclusive has different
behavior.

> >   root cgroup
> >        |
> >       A1  //MK: A4 A5 here?
> >      /  \
> >    A2    A3... //MK: A4 A5 or here?
> >
> > #1> echo "0-1" > A1/cpuset.cpus
> > #2> echo "root" > A1/cpuset.cpus.partition
> > #3> echo "0-1" > A2/cpuset.cpus
> > #4> echo "root" > A2/cpuset.cpus.partition
> > mkdir A4
> > mkdir A5
> > echo "0" > A4/cpuset.cpus
> > echo $$ > A4/cgroup.procs
> > echo "1" > A5/cpuset.cpus
> > echo $$ > A5/cgroup.procs
> >
> 
> If A2...A5 all belong to the same user, and that user wants both A4 and A5 
> to have effective CPUs, then the user should also understand that A2 needs
> to be adjusted to "member" instead of "root".
> 
> if A2...A5 belong to different users, must satisfying user A4’s requirement
> come at the expense of user A2’s requirement? That is not fair.

If A4 is a sibling at the level of A1, then A2 must be stripped of its
CPUs to honor the hierarchy hence the apparent unfairness.

If A4 is a sibling at the level of A2 and they have different owning
users, their respective cpuset.cpus should only be writable by A1's user
(the one who distributes the cpus) so that any arbitration between the
siblings is avoided.

0.02€,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months ago
Hi, Michal,

On Mon, 8 Dec 2025 15:31:52 +0100, Michal Koutný wrote:
>>>   root cgroup
>>>        |
>>>       A1  //MK: A4 A5 here?
>>>      /  \
>>>    A2    A3... //MK: A4 A5 or here?
>>>
>>> #1> echo "0-1" > A1/cpuset.cpus
>>> #2> echo "root" > A1/cpuset.cpus.partition
>>> #3> echo "0-1" > A2/cpuset.cpus
>>> #4> echo "root" > A2/cpuset.cpus.partition
>>> mkdir A4
>>> mkdir A5
>>> echo "0" > A4/cpuset.cpus
>>> echo $$ > A4/cgroup.procs
>>> echo "1" > A5/cpuset.cpus
>>> echo $$ > A5/cgroup.procs
>>>
>>
>>If A2...A5 all belong to the same user, and that user wants both A4 and A5 
>>to have effective CPUs, then the user should also understand that A2 needs
>>to be adjusted to "member" instead of "root".
>>
>>if A2...A5 belong to different users, must satisfying user A4’s requirement
>>come at the expense of user A2’s requirement? That is not fair.
>
>If A4 is a sibling at the level of A1, then A2 must be stripped of its
>CPUs to honor the hierarchy hence the apparent unfairness.
>
>If A4 is a sibling at the level of A2 and they have different owning
>users, their respective cpuset.cpus should only be writable by A1's user
>(the one who distributes the cpus) so that any arbitration between the
>siblings is avoided.

Regardless of whether A1 through A5 belong to the same user or different
users, arbitration conflicts between sibling nodes can still occur (e.g.,
due to user misconfiguration). The key question is: when such a conflict
arises, should all sibling nodes be invalidated, or only the node that
triggered the conflict?

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 2 months ago
On Wed, Dec 10, 2025 at 06:11:08PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> Regardless of whether A1 through A5 belong to the same user or different
> users, arbitration conflicts between sibling nodes can still occur (e.g.,
> due to user misconfiguration). The key question is: when such a conflict
> arises, should all sibling nodes be invalidated, or only the node that
> triggered the conflict?

Any serious [1] affinity users should watch for cpuset.cpus.partition
already (since it can be invalidated by hotplug or IMO more probable
ancestor re-configuration). Do you agree?

Then I'd say it's reasonable to invalidate all (same reasoning -- it
doesn't matter on the order in which siblings are configured, I consider
local partitions). What would you see as the upsides of invalidating
only the last offender (under the assumption above about watching)?

Thanks,
Michal

[1] The others may make use of the proposed cpu.max.concurrency [2]
[2] https://lpc.events/event/18/contributions/1978/

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 1 month, 4 weeks ago
Hi, Michal

On Thu, 11 Dec 2025 11:59:27 +0100, Michal Koutný wrote:
>On Wed, Dec 10, 2025 at 06:11:08PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> Regardless of whether A1 through A5 belong to the same user or different
>> users, arbitration conflicts between sibling nodes can still occur (e.g.,
>> due to user misconfiguration). The key question is: when such a conflict
>> arises, should all sibling nodes be invalidated, or only the node that
>> triggered the conflict?
>
>Any serious [1] affinity users should watch for cpuset.cpus.partition
>already (since it can be invalidated by hotplug or IMO more probable
>ancestor re-configuration). Do you agree?
>
>Then I'd say it's reasonable to invalidate all (same reasoning -- it
>doesn't matter on the order in which siblings are configured, I consider
>local partitions). What would you see as the upsides of invalidating
>only the last offender (under the assumption above about watching)?

I agree that users should watch the state of their cpuset.cpus.partition.
Moreover, assuming the user is watching, there is no harm in invalidating
only the last conflicting partition.

For example

           root cgroup
                |
   --------------------------
   |      |     |    |      |    
   A      B    ...   M      N
 (root) (root) ... (root) (root)

Condition: Node N is the last one configured by the user.
           After its configuration, it conflicts with all previous nodes
           (A through M).

When all are invalidated, the user will notice that A-M are all invalidated
because they are watching. If the user wants to restore the exclusivity
of A-M, they need to reconfigure A-M once more, as well as N.

When only the last conflict is invalidated, the user will notice that N is
invalidated, and then they only need to reconfigure N.
This seems more convenient for the user.

However, whether watching is in place is not the key to this issue,
because watching merely reveals the outcome.

If A through N belong to different users, and when N conflicts with all of
A through M, then after the users of A-M observe the invalidation result
through watching, they cannot even restore their exclusive state, because
they will always conflict with N.

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 2 weeks ago
Hi, Longman,

On Mon, 24 Nov 2025 17:30:47, Waiman Long wrote:
>On 11/19/25 5:57 AM, Sun Shaojie wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with its sibling partition, the sibling's partition state becomes invalid.
>> However, this invalidation is often unnecessary. If the cpuset being
>> modified is exclusive, it should invalidate itself upon conflict.
>>
>> This patch applies only to the following two cases:
>>
>> Assume the machine has 4 CPUs (0-3).
>>
>>     root cgroup
>>        /    \
>>      A1      B1
>>
>> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>>
>>   Table 1.1: Before applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>
>> After step #3, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0). However, B1 can actually
>> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
>> remain as "root."
>>
>>   Table 1.2: After applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>>
>> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
>>
>>   Table 2.1: Before applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>   #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>
>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>> always have at least CPU 2 available. This makes it unnecessary to mark
>> A1 as "root invalid".
>>
>>   Table 2.2: After applying this patch
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>   #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>
>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>> is no need to change A1 from "root" to "root invalid".
>>
>> All other cases remain unaffected. For example, cgroup-v1.
>
>This patch is relatively simple. As others have pointed out, there are 
>inconsistency depending on the operation ordering.
>
>In the example above, the final configuration is A1:0-1 & B1:1-2. As the 
>cpu lists overlap, we can't have both of them as valid partition roots. 
>So either one of A1 or B1 is valid or they are both invalid. The current 
>code makes them both invalid no matter the operation ordering.  This 
>patch will make one of them valid given the operation ordering above. To 
>minimize partition invalidation, we will have to live with the fact that 
>it will be first-come first-serve as noted by Michal. I am not against 
>this, we just have to document it. However, the following operation 
>order will still make both of them invalid:
>
># echo "0-1" >A1/cpuset.cpus # echo "2" > B1/cpuset.cpus # echo "1-2" > 
>B1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # echo "root" > 
>B1/cpuset.cpus.partition
>
>To follow the "first-come first-serve" rule, A1 should be valid and B1 
>invalid. That is the inconsistency with your current patch. To fix that, 
>we still need to relax the overlap checking rule similar to your v4 patch.

Thank you for your suggestion! Will update.

Thanks,
Sun Shaojie
[PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 1 week ago
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with its sibling partition, the sibling's partition state becomes invalid.
However, this invalidation is often unnecessary.

For example: On a machine with 128 CPUs, there are m (m < 128) cpusets
under the root cgroup. Each cpuset is used by a single user(user-1 use
A1, ... , user-m use Am), and the partition states of these cpusets are
configured as follows:

                           root cgroup
        /             /                  \                 \
       A1            A2        ...       An                Am
     (root)        (root)      ...     (root) (root/root invalid/member)

Assume that A1 through Am have not set cpuset.cpus.exclusive. When
user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition
states from A1 to An to change from root to root invalid, as shown
below.

                           root cgroup
        /              /                 \                 \
       A1             A2       ...       An                Am
 (root invalid) (root invalid) ... (root invalid) (root invalid/member)

This outcome is entirely undeserved for all users from A1 to An.

This patch prevents such outcomes by ensuring that modifications to
cpuset.cpus do not affect the partition state of other sibling cpusets.
Therefore, with this patch applied, when user-m configures Am's
cpuset.cpus to "0-127", the result will be as follows.

                           root cgroup
        /             /                  \                 \
       A1            A2        ...       An                Am
     (root)        (root)      ...     (root)     (root invalid/member)

It is worth noting that, since this patch enforces the exclusivity of
sibling cpusets, setting exclusivity now follows a "first-come,
first-served" principle.

For example, consider the following four steps: before applying this
patch, regardless of the order in which they are executed, the final
partition state of both A1 and B1 would always be "root invalid."

 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |

After applying this patch, the first party to set "root" will maintain
its exclusive validity. As follows:

 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "1-2" > B1/cpuset.cpus            | root         | member       |
 #4> echo "root" > B1/cpuset.cpus.partition | root         | root invalid |

 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > B1/cpuset.cpus            | member       | member       |
 #2> echo "root" > B1/cpuset.cpus.partition | member       | root         |
 #3> echo "1-2" > A1/cpuset.cpus            | member       | root         |
 #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root         |

In summary, if the current cpuset conflicts with its sibling cpusets on
exclusive CPUs (If a cpuset is exclusive and its exclusive CPUs are empty,
its allowed CPUs will be treated as exclusive CPUs), only the current
cpuset should bear the consequences.

Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
---
 kernel/cgroup/cpuset-internal.h               |  3 +
 kernel/cgroup/cpuset-v1.c                     | 19 ++++++
 kernel/cgroup/cpuset.c                        | 60 ++++++++++++-------
 .../selftests/cgroup/test_cpuset_prs.sh       | 12 ++--
 4 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 337608f408ce..c53111998432 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
 			    bool cpus_updated, bool mems_updated);
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
 #else
 static inline void fmeter_init(struct fmeter *fmp) {}
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
@@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    bool cpus_updated, bool mems_updated) {}
 static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
+static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
+				struct cpuset *cs2) {return false; }
 #endif /* CONFIG_CPUSETS_V1 */
 
 #endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 12e76774c75b..5aa0ac092ef6 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
 	return ret;
 }
 
+/*
+ * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
+ *                                to legacy (v1)
+ * @cs1: first cpuset to check
+ * @cs2: second cpuset to check
+ *
+ * Returns: true if CPU exclusivity conflict exists, false otherwise
+ *
+ * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
+ */
+bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
+{
+	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
+		return cpumask_intersects(cs1->cpus_allowed,
+					  cs2->cpus_allowed);
+
+	return false;
+}
+
 #ifdef CONFIG_PROC_PID_CPUSET
 /*
  * proc_cpuset_show()
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 52468d2c178a..e58dd26e074a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -586,14 +586,24 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
  * Returns: true if CPU exclusivity conflict exists, false otherwise
  *
  * Conflict detection rules:
- * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
+ * For cgroup-v1:
+ *     see cpuset1_cpus_excl_conflict()
+ * For cgroup-v2:
+ * 1. If both cs1 and cs2 are exclusive, cs1 and cs2 must be mutually exclusive
  * 2. exclusive_cpus masks cannot intersect between cpusets
  * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
+ * 4. If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs
+ *    will be treated as exclusive CPUs; therefore, its allowed CPUs must not
+ *    intersect with another's exclusive CPUs.
  */
 static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
 {
-	/* If either cpuset is exclusive, check if they are mutually exclusive */
-	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
+	/* For cgroup-v1 */
+	if (!cpuset_v2())
+		return cpuset1_cpus_excl_conflict(cs1, cs2);
+
+	/* If cpusets are exclusive, check if they are mutually exclusive*/
+	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
 		return !cpusets_are_exclusive(cs1, cs2);
 
 	/* Exclusive_cpus cannot intersect */
@@ -609,6 +619,20 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
 	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
 		return true;
 
+	/*
+	 * When a cpuset is exclusive and its exclusive CPUs are empty,
+	 * its cpus_allowed cannot intersect with another cpuset's exclusive_cpus.
+	 */
+	if (is_cpu_exclusive(cs1) &&
+	    cpumask_empty(cs1->exclusive_cpus) &&
+	    cpumask_intersects(cs1->cpus_allowed, cs2->exclusive_cpus))
+		return true;
+
+	if (is_cpu_exclusive(cs2) &&
+	    cpumask_empty(cs2->exclusive_cpus) &&
+	    cpumask_intersects(cs2->cpus_allowed, cs1->exclusive_cpus))
+		return true;
+
 	return false;
 }
 
@@ -2411,34 +2435,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
 					struct tmpmasks *tmp)
 {
 	int retval;
-	struct cpuset *parent = parent_cs(cs);
 
 	retval = validate_change(cs, trialcs);
 
 	if ((retval == -EINVAL) && cpuset_v2()) {
-		struct cgroup_subsys_state *css;
-		struct cpuset *cp;
-
 		/*
 		 * The -EINVAL error code indicates that partition sibling
 		 * CPU exclusivity rule has been violated. We still allow
 		 * the cpumask change to proceed while invalidating the
-		 * partition. However, any conflicting sibling partitions
-		 * have to be marked as invalid too.
+		 * partition.
 		 */
 		trialcs->prs_err = PERR_NOTEXCL;
-		rcu_read_lock();
-		cpuset_for_each_child(cp, css, parent) {
-			struct cpumask *xcpus = user_xcpus(trialcs);
-
-			if (is_partition_valid(cp) &&
-			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
-				rcu_read_unlock();
-				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
-				rcu_read_lock();
-			}
-		}
-		rcu_read_unlock();
 		retval = 0;
 	}
 	return retval;
@@ -2506,8 +2513,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (alloc_tmpmasks(&tmp))
 		return -ENOMEM;
 
-	compute_trialcs_excpus(trialcs, cs);
-	trialcs->prs_err = PERR_NONE;
+	/*
+	 * if there is exclusive CPUs conflict with the siblings,
+	 * we still allow the cpumask change to proceed while
+	 * invalidating the partition.
+	 */
+	if (compute_trialcs_excpus(trialcs, cs))
+		trialcs->prs_err = PERR_NOTEXCL;
+	else
+		trialcs->prs_err = PERR_NONE;
 
 	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
 	if (retval < 0)
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a17256d9f88a..75154e22c702 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -269,7 +269,7 @@ TEST_MATRIX=(
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X3:P2    .      .     0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3  X2-3:P2   .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
 	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:C3 .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
-	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
+	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
 	" C0-3:S+ C1-3:S+ C2-3   C4-5     .      .      .      P2    0 B1:4-5 B1:P2 4-5"
 	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3  X2-3:P2   P2    0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
 	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3 X2-3:P2:C1-3 P2  0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
@@ -318,7 +318,7 @@ TEST_MATRIX=(
 	# Invalid to valid local partition direct transition tests
 	" C1-3:S+:P2 X4:P2  .      .      .      .      .      .     0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
 	" C1-3:S+:P2 X4:P2  .      .      .    X3:P2    .      .     0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
-	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
+	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:5-6 A1:P2|B1:P0"
 	"  C0-3:P2   .      .    C4-6 C0-4:C0-3  .      .      .     0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
 
 	# Local partition invalidation tests
@@ -388,10 +388,10 @@ TEST_MATRIX=(
 	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
 	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
 
-	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
-	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
-	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
-	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
+	# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
+	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:3 A1:P1|B1:P0"
+	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P-1|B1:P1"
+	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
 
 	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
 	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"
-- 
2.25.1
Re: [PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 1 month, 2 weeks ago
Hello Shaojie.

On Mon, Dec 01, 2025 at 05:38:06PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary.
> 
> For example: On a machine with 128 CPUs, there are m (m < 128) cpusets
> under the root cgroup. Each cpuset is used by a single user(user-1 use
> A1, ... , user-m use Am), and the partition states of these cpusets are
> configured as follows:
> 
>                            root cgroup
>         /             /                  \                 \
>        A1            A2        ...       An                Am
>      (root)        (root)      ...     (root) (root/root invalid/member)
> 
> Assume that A1 through Am have not set cpuset.cpus.exclusive. When
> user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition
> states from A1 to An to change from root to root invalid, as shown
> below.
> 
>                            root cgroup
>         /              /                 \                 \
>        A1             A2       ...       An                Am
>  (root invalid) (root invalid) ... (root invalid) (root invalid/member)
> 
> This outcome is entirely undeserved for all users from A1 to An.

s/cpuset.cpus/memory.max/ 

When the permissions are such that the last (any) sibling can come and
claim so much to cause overcommit, then it can set up large limit and
(potentially) reclaim from others.

s/cpuset.cpus/memory.min/

Here is the overcommit approached by recalculating effective values of
memory.min, again one sibling can skew toward itself and reduce every
other's effective value.

Above are not exact analogies because first of them is Limits, the
second is Protections and cpusets are Allocations (refering to Resource
Distribution Models from Documentation/admin-guide/cgroup-v2.rst).

But the advice to get some guarantees would be same in all cases -- if
some guarantees are expected, the permissions (of respective cgroup
attributes) should be configured so that it decouples the owner of the
cgroup from the owner of the resource (i.e. Ai/cpuset.cpus belongs to
root or there's a middle level cgroup that'd cap each of the siblings
individually).


> After applying this patch, the first party to set "root" will maintain
> its exclusive validity. As follows:
> 
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "1-2" > B1/cpuset.cpus            | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root invalid |
> 
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > B1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > B1/cpuset.cpus.partition | member       | root         |
>  #3> echo "1-2" > A1/cpuset.cpus            | member       | root         |
>  #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root         |

I'm worried that the ordering dependency would lead to situations where
users may not be immediately aware their config is overcommitting the system.
Consider that CPUs are vital for A1 but B1 can somehow survive the
degraded state, depending on the starting order the system may either
run fine (A1 valid) or fail because of A1.

I'm curious about Waiman's take.

Thanks,
Michal
Re: [PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Waiman Long 1 month, 2 weeks ago
On 12/22/25 10:26 AM, Michal Koutný wrote:
> Hello Shaojie.
>
> On Mon, Dec 01, 2025 at 05:38:06PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with its sibling partition, the sibling's partition state becomes invalid.
>> However, this invalidation is often unnecessary.
>>
>> For example: On a machine with 128 CPUs, there are m (m < 128) cpusets
>> under the root cgroup. Each cpuset is used by a single user(user-1 use
>> A1, ... , user-m use Am), and the partition states of these cpusets are
>> configured as follows:
>>
>>                             root cgroup
>>          /             /                  \                 \
>>         A1            A2        ...       An                Am
>>       (root)        (root)      ...     (root) (root/root invalid/member)
>>
>> Assume that A1 through Am have not set cpuset.cpus.exclusive. When
>> user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition
>> states from A1 to An to change from root to root invalid, as shown
>> below.
>>
>>                             root cgroup
>>          /              /                 \                 \
>>         A1             A2       ...       An                Am
>>   (root invalid) (root invalid) ... (root invalid) (root invalid/member)
>>
>> This outcome is entirely undeserved for all users from A1 to An.
> s/cpuset.cpus/memory.max/
>
> When the permissions are such that the last (any) sibling can come and
> claim so much to cause overcommit, then it can set up large limit and
> (potentially) reclaim from others.
>
> s/cpuset.cpus/memory.min/
>
> Here is the overcommit approached by recalculating effective values of
> memory.min, again one sibling can skew toward itself and reduce every
> other's effective value.
>
> Above are not exact analogies because first of them is Limits, the
> second is Protections and cpusets are Allocations (refering to Resource
> Distribution Models from Documentation/admin-guide/cgroup-v2.rst).
>
> But the advice to get some guarantees would be same in all cases -- if
> some guarantees are expected, the permissions (of respective cgroup
> attributes) should be configured so that it decouples the owner of the
> cgroup from the owner of the resource (i.e. Ai/cpuset.cpus belongs to
> root or there's a middle level cgroup that'd cap each of the siblings
> individually).
>
 From sibling point of view, CPUs in partitions are exclusive. A cpuset 
either have all the requested CPUs to form a partition (assuming that at 
least one can be granted from the parent cpuset) or it doesn't have all 
of them and fails to form a valid partition. It is different from memory 
that a cgroup can have a reduced amount of memory than requested and can 
still work fine.

Anyway, I consider using cpuset.cpus to form a partition is legacy and 
is supported for backward compatibility reason. Now the proper way to 
form a partition is to use cpuset.cpus.exclusive, the setting of it can 
fail if it conflicts with siblings.

By using cpuset.cpus only to form partitions, the cpuset.cpus value will 
be treated the same as cpuset.cpus.exclusive if a valid partition is 
formed. In that sense, the examples listed in the patch will have the 
same result if cpuset.cpu.exclusive is used instead of cpuset.cpus. The 
difference is that writing to the cpuset.cpus.exclusive will fail 
instead of forming an invalid partition in the case of cpust.cpus.

>> After applying this patch, the first party to set "root" will maintain
>> its exclusive validity. As follows:
>>
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>   #3> echo "1-2" > B1/cpuset.cpus            | root         | member       |
>>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root invalid |
>>
>>   Step                                       | A1's prstate | B1's prstate |
>>   #1> echo "0-1" > B1/cpuset.cpus            | member       | member       |
>>   #2> echo "root" > B1/cpuset.cpus.partition | member       | root         |
>>   #3> echo "1-2" > A1/cpuset.cpus            | member       | root         |
>>   #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root         |
> I'm worried that the ordering dependency would lead to situations where
> users may not be immediately aware their config is overcommitting the system.
> Consider that CPUs are vital for A1 but B1 can somehow survive the
> degraded state, depending on the starting order the system may either
> run fine (A1 valid) or fail because of A1.
>
> I'm curious about Waiman's take.

That is why I will recommend users to use cpuset.cpus.exclusive to form 
partition as they can get early feedback if they are overcommitting. Of 
course, setting cpuset.cpus.exclusive without failure still doesn't 
guarantee the formation of a valid partition if none of the exclusive 
CPUs can be granted from the parent.

Cheers,
Longman

Re: [PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Waiman Long 1 month, 2 weeks ago
On 12/1/25 4:38 AM, Sun Shaojie wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary.
>
> For example: On a machine with 128 CPUs, there are m (m < 128) cpusets
> under the root cgroup. Each cpuset is used by a single user(user-1 use
> A1, ... , user-m use Am), and the partition states of these cpusets are
> configured as follows:
>
>                             root cgroup
>          /             /                  \                 \
>         A1            A2        ...       An                Am
>       (root)        (root)      ...     (root) (root/root invalid/member)
>
> Assume that A1 through Am have not set cpuset.cpus.exclusive. When
> user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition
> states from A1 to An to change from root to root invalid, as shown
> below.
>
>                             root cgroup
>          /              /                 \                 \
>         A1             A2       ...       An                Am
>   (root invalid) (root invalid) ... (root invalid) (root invalid/member)
>
> This outcome is entirely undeserved for all users from A1 to An.
>
> This patch prevents such outcomes by ensuring that modifications to
> cpuset.cpus do not affect the partition state of other sibling cpusets.
> Therefore, with this patch applied, when user-m configures Am's
> cpuset.cpus to "0-127", the result will be as follows.
>
>                             root cgroup
>          /             /                  \                 \
>         A1            A2        ...       An                Am
>       (root)        (root)      ...     (root)     (root invalid/member)
>
> It is worth noting that, since this patch enforces the exclusivity of
> sibling cpusets, setting exclusivity now follows a "first-come,
> first-served" principle.
>
> For example, consider the following four steps: before applying this
> patch, regardless of the order in which they are executed, the final
> partition state of both A1 and B1 would always be "root invalid."
>
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
>
> After applying this patch, the first party to set "root" will maintain
> its exclusive validity. As follows:
>
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "1-2" > B1/cpuset.cpus            | root         | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root invalid |
>
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > B1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > B1/cpuset.cpus.partition | member       | root         |
>   #3> echo "1-2" > A1/cpuset.cpus            | member       | root         |
>   #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root         |
>
> In summary, if the current cpuset conflicts with its sibling cpusets on
> exclusive CPUs (If a cpuset is exclusive and its exclusive CPUs are empty,
> its allowed CPUs will be treated as exclusive CPUs), only the current
> cpuset should bear the consequences.
>
> Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>

I agree with you that it is probably not a good idea to invalidate 
partitions whenever there is a conflict. However, I have a different 
idea of how to do it. I am going to post another patch to show my idea. 
Let me know what you think about it and whether it can meet your need.

Cheers,
Longman

> ---
>   kernel/cgroup/cpuset-internal.h               |  3 +
>   kernel/cgroup/cpuset-v1.c                     | 19 ++++++
>   kernel/cgroup/cpuset.c                        | 60 ++++++++++++-------
>   .../selftests/cgroup/test_cpuset_prs.sh       | 12 ++--
>   4 files changed, 65 insertions(+), 29 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index 337608f408ce..c53111998432 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>   			    struct cpumask *new_cpus, nodemask_t *new_mems,
>   			    bool cpus_updated, bool mems_updated);
>   int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
>   #else
>   static inline void fmeter_init(struct fmeter *fmp) {}
>   static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
> @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>   			    bool cpus_updated, bool mems_updated) {}
>   static inline int cpuset1_validate_change(struct cpuset *cur,
>   				struct cpuset *trial) { return 0; }
> +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
> +				struct cpuset *cs2) {return false; }
>   #endif /* CONFIG_CPUSETS_V1 */
>   
>   #endif /* __CPUSET_INTERNAL_H */
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 12e76774c75b..5aa0ac092ef6 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
>   	return ret;
>   }
>   
> +/*
> + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
> + *                                to legacy (v1)
> + * @cs1: first cpuset to check
> + * @cs2: second cpuset to check
> + *
> + * Returns: true if CPU exclusivity conflict exists, false otherwise
> + *
> + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
> + */
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> +{
> +	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> +		return cpumask_intersects(cs1->cpus_allowed,
> +					  cs2->cpus_allowed);
> +
> +	return false;
> +}
> +
>   #ifdef CONFIG_PROC_PID_CPUSET
>   /*
>    * proc_cpuset_show()
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 52468d2c178a..e58dd26e074a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -586,14 +586,24 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>    * Returns: true if CPU exclusivity conflict exists, false otherwise
>    *
>    * Conflict detection rules:
> - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
> + * For cgroup-v1:
> + *     see cpuset1_cpus_excl_conflict()
> + * For cgroup-v2:
> + * 1. If both cs1 and cs2 are exclusive, cs1 and cs2 must be mutually exclusive
>    * 2. exclusive_cpus masks cannot intersect between cpusets
>    * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
> + * 4. If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs
> + *    will be treated as exclusive CPUs; therefore, its allowed CPUs must not
> + *    intersect with another's exclusive CPUs.
>    */
>   static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>   {
> -	/* If either cpuset is exclusive, check if they are mutually exclusive */
> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> +	/* For cgroup-v1 */
> +	if (!cpuset_v2())
> +		return cpuset1_cpus_excl_conflict(cs1, cs2);
> +
> +	/* If cpusets are exclusive, check if they are mutually exclusive*/
> +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>   		return !cpusets_are_exclusive(cs1, cs2);
>   
>   	/* Exclusive_cpus cannot intersect */
> @@ -609,6 +619,20 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>   	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
>   		return true;
>   
> +	/*
> +	 * When a cpuset is exclusive and its exclusive CPUs are empty,
> +	 * its cpus_allowed cannot intersect with another cpuset's exclusive_cpus.
> +	 */
> +	if (is_cpu_exclusive(cs1) &&
> +	    cpumask_empty(cs1->exclusive_cpus) &&
> +	    cpumask_intersects(cs1->cpus_allowed, cs2->exclusive_cpus))
> +		return true;
> +
> +	if (is_cpu_exclusive(cs2) &&
> +	    cpumask_empty(cs2->exclusive_cpus) &&
> +	    cpumask_intersects(cs2->cpus_allowed, cs1->exclusive_cpus))
> +		return true;
> +
>   	return false;
>   }
>   
> @@ -2411,34 +2435,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>   					struct tmpmasks *tmp)
>   {
>   	int retval;
> -	struct cpuset *parent = parent_cs(cs);
>   
>   	retval = validate_change(cs, trialcs);
>   
>   	if ((retval == -EINVAL) && cpuset_v2()) {
> -		struct cgroup_subsys_state *css;
> -		struct cpuset *cp;
> -
>   		/*
>   		 * The -EINVAL error code indicates that partition sibling
>   		 * CPU exclusivity rule has been violated. We still allow
>   		 * the cpumask change to proceed while invalidating the
> -		 * partition. However, any conflicting sibling partitions
> -		 * have to be marked as invalid too.
> +		 * partition.
>   		 */
>   		trialcs->prs_err = PERR_NOTEXCL;
> -		rcu_read_lock();
> -		cpuset_for_each_child(cp, css, parent) {
> -			struct cpumask *xcpus = user_xcpus(trialcs);
> -
> -			if (is_partition_valid(cp) &&
> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
> -				rcu_read_unlock();
> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
> -				rcu_read_lock();
> -			}
> -		}
> -		rcu_read_unlock();
>   		retval = 0;
>   	}
>   	return retval;
> @@ -2506,8 +2513,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>   	if (alloc_tmpmasks(&tmp))
>   		return -ENOMEM;
>   
> -	compute_trialcs_excpus(trialcs, cs);
> -	trialcs->prs_err = PERR_NONE;
> +	/*
> +	 * if there is exclusive CPUs conflict with the siblings,
> +	 * we still allow the cpumask change to proceed while
> +	 * invalidating the partition.
> +	 */
> +	if (compute_trialcs_excpus(trialcs, cs))
> +		trialcs->prs_err = PERR_NOTEXCL;
> +	else
> +		trialcs->prs_err = PERR_NONE;
>   
>   	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
>   	if (retval < 0)
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index a17256d9f88a..75154e22c702 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -269,7 +269,7 @@ TEST_MATRIX=(
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X3:P2    .      .     0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3  X2-3:P2   .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:C3 .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> -	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
> +	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
>   	" C0-3:S+ C1-3:S+ C2-3   C4-5     .      .      .      P2    0 B1:4-5 B1:P2 4-5"
>   	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3  X2-3:P2   P2    0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
>   	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3 X2-3:P2:C1-3 P2  0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> @@ -318,7 +318,7 @@ TEST_MATRIX=(
>   	# Invalid to valid local partition direct transition tests
>   	" C1-3:S+:P2 X4:P2  .      .      .      .      .      .     0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
>   	" C1-3:S+:P2 X4:P2  .      .      .    X3:P2    .      .     0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
> -	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
> +	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:5-6 A1:P2|B1:P0"
>   	"  C0-3:P2   .      .    C4-6 C0-4:C0-3  .      .      .     0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
>   
>   	# Local partition invalidation tests
> @@ -388,10 +388,10 @@ TEST_MATRIX=(
>   	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
>   	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
>   
> -	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
> -	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
> -	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
> -	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
> +	# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
> +	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:3 A1:P1|B1:P0"
> +	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P-1|B1:P1"
> +	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
>   
>   	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
>   	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"
[PING][PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict
Posted by Sun Shaojie 1 month, 3 weeks ago
Hi, Longman,
 
Just a friendly ping regarding the patch "[PATCH v6] cpuset: Avoid
invalidating sibling partitions on cpuset.cpus conflict" sent on
[Mon,  1 Dec 2025 17:38:06 +0800].
Link: https://lore.kernel.org/cgroups/20251201093806.107157-1-sunshaojie@kylinos.cn/

Could you please take a look when you have a moment? We'd appreciate any
initial feedback or suggestions you might have.

Thank you again for your time and consideration.

Thanks,
Sun Shaojie
Re: [PING][PATCH v6] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict
Posted by Waiman Long 1 month, 2 weeks ago
On 12/17/25 4:45 AM, Sun Shaojie wrote:
> Hi, Longman,
>   
> Just a friendly ping regarding the patch "[PATCH v6] cpuset: Avoid
> invalidating sibling partitions on cpuset.cpus conflict" sent on
> [Mon,  1 Dec 2025 17:38:06 +0800].
> Link: https://lore.kernel.org/cgroups/20251201093806.107157-1-sunshaojie@kylinos.cn/
>
> Could you please take a look when you have a moment? We'd appreciate any
> initial feedback or suggestions you might have.
>
> Thank you again for your time and consideration.

I am sorry that I am late in reviewing your patch. I was busy in the 
last few weeks. Now I will try to review your patch later this week.

Cheers,
Longman
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/19 18:57, Sun Shaojie wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary. If the cpuset being
> modified is exclusive, it should invalidate itself upon conflict.
> 
> This patch applies only to the following two cases:
> 
> Assume the machine has 4 CPUs (0-3).
> 
>    root cgroup
>       /    \
>     A1      B1
> 
> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
> 
>  Table 1.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
> 
> After step #3, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0). However, B1 can actually
> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
> remain as "root."
> 
>  Table 1.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
> 
> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
> 
>  Table 2.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
> 
> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
> regardless of what conflicting value B1 writes to cpuset.cpus, it will
> always have at least CPU 2 available. This makes it unnecessary to mark
> A1 as "root invalid".
> 
>  Table 2.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
> 
> In summary, regardless of how B1 configures its cpuset.cpus, there will
> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
> is no need to change A1 from "root" to "root invalid".
> 
> All other cases remain unaffected. For example, cgroup-v1.
> 
> Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn>
> ---
>  kernel/cgroup/cpuset.c                        | 19 +------------------
>  .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
>  2 files changed, 5 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 52468d2c178a..f6a834335ebf 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>  					struct tmpmasks *tmp)
>  {
>  	int retval;
> -	struct cpuset *parent = parent_cs(cs);
>  
>  	retval = validate_change(cs, trialcs);
>  
>  	if ((retval == -EINVAL) && cpuset_v2()) {
> -		struct cgroup_subsys_state *css;
> -		struct cpuset *cp;
> -
>  		/*
>  		 * The -EINVAL error code indicates that partition sibling
>  		 * CPU exclusivity rule has been violated. We still allow
>  		 * the cpumask change to proceed while invalidating the
> -		 * partition. However, any conflicting sibling partitions
> -		 * have to be marked as invalid too.
> +		 * partition.
>  		 */
>  		trialcs->prs_err = PERR_NOTEXCL;
> -		rcu_read_lock();
> -		cpuset_for_each_child(cp, css, parent) {
> -			struct cpumask *xcpus = user_xcpus(trialcs);
> -
> -			if (is_partition_valid(cp) &&
> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
> -				rcu_read_unlock();
> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
> -				rcu_read_lock();
> -			}
> -		}
> -		rcu_read_unlock();
>  		retval = 0;
>  	}
>  	return retval;

If we remove this logic, there is a scenario where the parent (a partition) could end up with empty
effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to
disable its siblings' partitions.

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 3 weeks ago
Hi, Ridong,

On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:
>On 2025/11/19 18:57, Sun Shaojie wrote:
>>  kernel/cgroup/cpuset.c                        | 19 +------------------
>>  .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
>>  2 files changed, 5 insertions(+), 21 deletions(-)
>> 
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 52468d2c178a..f6a834335ebf 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>>  					struct tmpmasks *tmp)
>>  {
>>  	int retval;
>> -	struct cpuset *parent = parent_cs(cs);
>>  
>>  	retval = validate_change(cs, trialcs);
>>  
>>  	if ((retval == -EINVAL) && cpuset_v2()) {
>> -		struct cgroup_subsys_state *css;
>> -		struct cpuset *cp;
>> -
>>  		/*
>>  		 * The -EINVAL error code indicates that partition sibling
>>  		 * CPU exclusivity rule has been violated. We still allow
>>  		 * the cpumask change to proceed while invalidating the
>> -		 * partition. However, any conflicting sibling partitions
>> -		 * have to be marked as invalid too.
>> +		 * partition.
>>  		 */
>>  		trialcs->prs_err = PERR_NOTEXCL;
>> -		rcu_read_lock();
>> -		cpuset_for_each_child(cp, css, parent) {
>> -			struct cpumask *xcpus = user_xcpus(trialcs);
>> -
>> -			if (is_partition_valid(cp) &&
>> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
>> -				rcu_read_unlock();
>> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
>> -				rcu_read_lock();
>> -			}
>> -		}
>> -		rcu_read_unlock();
>>  		retval = 0;
>>  	}
>>  	return retval;
>
>If we remove this logic, there is a scenario where the parent (a partition) could end up with empty
>effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to
>disable its siblings' partitions.

I have carefully considered the scenario where parent effective CPUs are 
empty, which corresponds to the following two cases. (After apply this patch).

   root cgroup
        |
       A1
      /  \
    A2    A3

Case 1:
 Step:
 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition

 After step #4, 

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0-1          |              |
 effective_cpus |              | 0-1          |              |
 prstate        | root         | root         | member       |

 After step #4, A3's effective CPUs is empty.

 #5> echo "0-1" > A3/cpuset.cpus

 After step #5,

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0-1          | 0-1          |
 effective_cpus |              | 0-1          |              |
 prstate        | root         | root         | member       |

This patch affects step #5. After step #5, A3's effective CPUs is also empty.
Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus),
it is acceptable for them to remain empty after step #5. Moreover, if A3 is
aware that its parent's effective CPUs are empty, it should understand that
the CPUs it requests may not be granted.

Case 2:
 Step:
 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 #5> echo "1" > A3/cpuset.cpus
 #6> echo "root" > A3/cpuset.cpus.partition

 After step #6,

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0            | 1            |
 effective_cpus |              | 0            | 1            |
 prstate        | root         | root         | root         |

 #7> echo "0-1" > A3/cpuset.cpus

 After step #7,

                |      A1      |      A2      |      A3      |
 cpus_allowed   | 0-1          | 0            | 0-1          |
 effective_cpus | 1            | 0            | 1            |
 prstate        | root         | root         | root invalid |

This patch affects step #7. After step #7, A3 only affects itself, changing
from "root" to "root invalid". However, since its effective CPUs remain 1 
both before and after step #7, it doesn't matter even if A2 is not invalidated.

The purpose of this patch is to ensure that modifying cpuset.cpus does not 
disable its siblings' partitions.


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 2 weeks ago

On 2025/11/20 21:07, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Thu, 20 Nov 2025 08:51:30, Chen Ridong wrote:
>> On 2025/11/19 18:57, Sun Shaojie wrote:
>>>  kernel/cgroup/cpuset.c                        | 19 +------------------
>>>  .../selftests/cgroup/test_cpuset_prs.sh       |  7 ++++---
>>>  2 files changed, 5 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 52468d2c178a..f6a834335ebf 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -2411,34 +2411,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>>>  					struct tmpmasks *tmp)
>>>  {
>>>  	int retval;
>>> -	struct cpuset *parent = parent_cs(cs);
>>>  
>>>  	retval = validate_change(cs, trialcs);
>>>  
>>>  	if ((retval == -EINVAL) && cpuset_v2()) {
>>> -		struct cgroup_subsys_state *css;
>>> -		struct cpuset *cp;
>>> -
>>>  		/*
>>>  		 * The -EINVAL error code indicates that partition sibling
>>>  		 * CPU exclusivity rule has been violated. We still allow
>>>  		 * the cpumask change to proceed while invalidating the
>>> -		 * partition. However, any conflicting sibling partitions
>>> -		 * have to be marked as invalid too.
>>> +		 * partition.
>>>  		 */
>>>  		trialcs->prs_err = PERR_NOTEXCL;
>>> -		rcu_read_lock();
>>> -		cpuset_for_each_child(cp, css, parent) {
>>> -			struct cpumask *xcpus = user_xcpus(trialcs);
>>> -
>>> -			if (is_partition_valid(cp) &&
>>> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
>>> -				rcu_read_unlock();
>>> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
>>> -				rcu_read_lock();
>>> -			}
>>> -		}
>>> -		rcu_read_unlock();
>>>  		retval = 0;
>>>  	}
>>>  	return retval;
>>
>> If we remove this logic, there is a scenario where the parent (a partition) could end up with empty
>> effective CPUs. This means the corresponding CS will also have empty effective CPUs and thus fail to
>> disable its siblings' partitions.
> 
> I have carefully considered the scenario where parent effective CPUs are 
> empty, which corresponds to the following two cases. (After apply this patch).
> 
>    root cgroup
>         |
>        A1
>       /  \
>     A2    A3
> 
> Case 1:
>  Step:
>  #1> echo "0-1" > A1/cpuset.cpus
>  #2> echo "root" > A1/cpuset.cpus.partition
>  #3> echo "0-1" > A2/cpuset.cpus
>  #4> echo "root" > A2/cpuset.cpus.partition
> 
>  After step #4, 
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0-1          |              |
>  effective_cpus |              | 0-1          |              |
>  prstate        | root         | root         | member       |
> 
>  After step #4, A3's effective CPUs is empty.
> 

That may be a corner case is unexpected.

>  #5> echo "0-1" > A3/cpuset.cpus
> 

If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
A5, ...) afterward. However, prior to your patch, this migration was allowed.

>  After step #5,
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0-1          | 0-1          |
>  effective_cpus |              | 0-1          |              |
>  prstate        | root         | root         | member       |
> 
> This patch affects step #5. After step #5, A3's effective CPUs is also empty.
> Since A3's effective CPUs can be empty before step #5 (setting cpuset.cpus),
> it is acceptable for them to remain empty after step #5. Moreover, if A3 is
> aware that its parent's effective CPUs are empty, it should understand that
> the CPUs it requests may not be granted.
> 
> Case 2:
>  Step:
>  #1> echo "0-1" > A1/cpuset.cpus
>  #2> echo "root" > A1/cpuset.cpus.partition
>  #3> echo "0" > A2/cpuset.cpus
>  #4> echo "root" > A2/cpuset.cpus.partition
>  #5> echo "1" > A3/cpuset.cpus
>  #6> echo "root" > A3/cpuset.cpus.partition
> 
>  After step #6,
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0            | 1            |
>  effective_cpus |              | 0            | 1            |
>  prstate        | root         | root         | root         |
> 
>  #7> echo "0-1" > A3/cpuset.cpus
> 
>  After step #7,
> 
>                 |      A1      |      A2      |      A3      |
>  cpus_allowed   | 0-1          | 0            | 0-1          |
>  effective_cpus | 1            | 0            | 1            |
>  prstate        | root         | root         | root invalid |
> 
> This patch affects step #7. After step #7, A3 only affects itself, changing
> from "root" to "root invalid". However, since its effective CPUs remain 1 
> both before and after step #7, it doesn't matter even if A2 is not invalidated.
> 
> The purpose of this patch is to ensure that modifying cpuset.cpus does not 
> disable its siblings' partitions.
> 
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 2 weeks ago
Hi, Ridong,

On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>On 2025/11/20 21:07, Sun Shaojie wrote:
>> I have carefully considered the scenario where parent effective CPUs are 
>> empty, which corresponds to the following two cases. (After apply this patch).
>> 
>>    root cgroup
>>         |
>>        A1
>>       /  \
>>     A2    A3
>> 
>> Case 1:
>>  Step:
>>  #1> echo "0-1" > A1/cpuset.cpus
>>  #2> echo "root" > A1/cpuset.cpus.partition
>>  #3> echo "0-1" > A2/cpuset.cpus
>>  #4> echo "root" > A2/cpuset.cpus.partition
>> 
>>  After step #4, 
>> 
>>                 |      A1      |      A2      |      A3      |
>>  cpus_allowed   | 0-1          | 0-1          |              |
>>  effective_cpus |              | 0-1          |              |
>>  prstate        | root         | root         | member       |
>> 
>>  After step #4, A3's effective CPUs is empty.
>> 
>
>That may be a corner case is unexpected.
>
>>  #5> echo "0-1" > A3/cpuset.cpus
>> 
>
>If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>A5, ...) afterward. However, prior to your patch, this migration was allowed.

Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
And what parameters should be configured for A1's cpuset.cpus?
Could you provide a specific example?

Additionally, processes cannot be migrated into a cgroup whose 
cpuset.cpus.effective is empty. However, this patch does not modify this behavior.

So why does applying this patch enable such migration?

>>  After step #5,
>> 
>>                 |      A1      |      A2      |      A3      |
>>  cpus_allowed   | 0-1          | 0-1          | 0-1          |
>>  effective_cpus |              | 0-1          |              |
>>  prstate        | root         | root         | member       |
>> 


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 2 weeks ago

On 2025/11/21 18:32, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>> I have carefully considered the scenario where parent effective CPUs are 
>>> empty, which corresponds to the following two cases. (After apply this patch).
>>>
>>>    root cgroup
>>>         |
>>>        A1
>>>       /  \
>>>     A2    A3
>>>
>>> Case 1:
>>>  Step:
>>>  #1> echo "0-1" > A1/cpuset.cpus
>>>  #2> echo "root" > A1/cpuset.cpus.partition
>>>  #3> echo "0-1" > A2/cpuset.cpus
>>>  #4> echo "root" > A2/cpuset.cpus.partition
>>>
>>>  After step #4, 
>>>
>>>                 |      A1      |      A2      |      A3      |
>>>  cpus_allowed   | 0-1          | 0-1          |              |
>>>  effective_cpus |              | 0-1          |              |
>>>  prstate        | root         | root         | member       |
>>>
>>>  After step #4, A3's effective CPUs is empty.
>>>
>>
>> That may be a corner case is unexpected.
>>
>>>  #5> echo "0-1" > A3/cpuset.cpus
>>>
>>
>> If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>> for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>> A5, ...) afterward. However, prior to your patch, this migration was allowed.
> 
> Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
> And what parameters should be configured for A1's cpuset.cpus?
> Could you provide a specific example?
> 

 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 mkdir A4
 mkdir A5
 echo "0" > A4/cpuset.cpus
 echo $$ > A4/cgroup.procs
 echo "1" > A5/cpuset.cpus
 echo $$ > A5/cgroup.procs

You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.

However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is
a typical usage scenario.


> Additionally, processes cannot be migrated into a cgroup whose 
> cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
> 
> So why does applying this patch enable such migration?
> 


-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 2 weeks ago
Hi, Ridong,

On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:
>On 2025/11/21 18:32, Sun Shaojie wrote:
>> Hi, Ridong,
>> 
>> On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>>> I have carefully considered the scenario where parent effective CPUs are 
>>>> empty, which corresponds to the following two cases. (After apply this patch).
>>>>
>>>>    root cgroup
>>>>         |
>>>>        A1
>>>>       /  \
>>>>     A2    A3
>>>>
>>>> Case 1:
>>>>  Step:
>>>>  #1> echo "0-1" > A1/cpuset.cpus
>>>>  #2> echo "root" > A1/cpuset.cpus.partition
>>>>  #3> echo "0-1" > A2/cpuset.cpus
>>>>  #4> echo "root" > A2/cpuset.cpus.partition
>>>>
>>>>  After step #4, 
>>>>
>>>>                 |      A1      |      A2      |      A3      |
>>>>  cpus_allowed   | 0-1          | 0-1          |              |
>>>>  effective_cpus |              | 0-1          |              |
>>>>  prstate        | root         | root         | member       |
>>>>
>>>>  After step #4, A3's effective CPUs is empty.
>>>>
>>>
>>> That may be a corner case is unexpected.
>>>
>>>>  #5> echo "0-1" > A3/cpuset.cpus
>>>>
>>>
>>> If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>>> for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>>> A5, ...) afterward. However, prior to your patch, this migration was allowed.
>> 
>> Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
>> And what parameters should be configured for A1's cpuset.cpus?
>> Could you provide a specific example?
>> 
>
> #1> echo "0-1" > A1/cpuset.cpus
> #2> echo "root" > A1/cpuset.cpus.partition
> #3> echo "0-1" > A2/cpuset.cpus
> #4> echo "root" > A2/cpuset.cpus.partition
> mkdir A4
> mkdir A5
> echo "0" > A4/cpuset.cpus
> echo $$ > A4/cgroup.procs
> echo "1" > A5/cpuset.cpus
> echo $$ > A5/cgroup.procs
>
>You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.
>
>However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is
>a typical usage scenario.
>

I'm sorry, I didn't quite understand the point you were trying to make with this example.

If that's the case

     root cgroup
          |
          A1
       / /  \ \
     A2 A3  A4 A5

 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 mkdir A4
 mkdir A5
 echo "0" > A4/cpuset.cpus
 echo $$ > A4/cgroup.procs  ->This will return an error because A4's effective CPUs are empty.
 echo "1" > A5/cpuset.cpus
 echo $$ > A5/cgroup.procs  ->This will return an error because A5's effective CPUs are empty.

Even with this patch applied, this result will not change.

>
>> Additionally, processes cannot be migrated into a cgroup whose 
>> cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
>> 
>> So why does applying this patch enable such migration?

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 2 weeks ago

On 2025/11/24 18:20, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Sat, 22 Nov 2025 09:33:34, Chen Ridong wrote:
>> On 2025/11/21 18:32, Sun Shaojie wrote:
>>> Hi, Ridong,
>>>
>>> On Thu, 20 Nov 2025 21:45:16, Chen Ridong wrote:
>>>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>>>> I have carefully considered the scenario where parent effective CPUs are 
>>>>> empty, which corresponds to the following two cases. (After apply this patch).
>>>>>
>>>>>    root cgroup
>>>>>         |
>>>>>        A1
>>>>>       /  \
>>>>>     A2    A3
>>>>>
>>>>> Case 1:
>>>>>  Step:
>>>>>  #1> echo "0-1" > A1/cpuset.cpus
>>>>>  #2> echo "root" > A1/cpuset.cpus.partition
>>>>>  #3> echo "0-1" > A2/cpuset.cpus
>>>>>  #4> echo "root" > A2/cpuset.cpus.partition
>>>>>
>>>>>  After step #4, 
>>>>>
>>>>>                 |      A1      |      A2      |      A3      |
>>>>>  cpus_allowed   | 0-1          | 0-1          |              |
>>>>>  effective_cpus |              | 0-1          |              |
>>>>>  prstate        | root         | root         | member       |
>>>>>
>>>>>  After step #4, A3's effective CPUs is empty.
>>>>>
>>>>
>>>> That may be a corner case is unexpected.
>>>>
>>>>>  #5> echo "0-1" > A3/cpuset.cpus
>>>>>
>>>>
>>>> If we create subdirectories (e.g., A4, A5, ...) under the A1 cpuset and then configure cpuset.cpus
>>>> for A1 (a common usage scenario), processes can no longer be migrated into these subdirectories (A4,
>>>> A5, ...) afterward. However, prior to your patch, this migration was allowed.
>>>
>>> Are you referring to creating subdirectories (A4, A5, ...) after step #4? 
>>> And what parameters should be configured for A1's cpuset.cpus?
>>> Could you provide a specific example?
>>>
>>
>> #1> echo "0-1" > A1/cpuset.cpus
>> #2> echo "root" > A1/cpuset.cpus.partition
>> #3> echo "0-1" > A2/cpuset.cpus
>> #4> echo "root" > A2/cpuset.cpus.partition
>> mkdir A4
>> mkdir A5
>> echo "0" > A4/cpuset.cpus
>> echo $$ > A4/cgroup.procs
>> echo "1" > A5/cpuset.cpus
>> echo $$ > A5/cgroup.procs
>>
>> You might be going to argue that we haven't set the cpus for A4/A5..., yeah, maybe a corner case.
>>
>> However, it’s common practice to configure a cpuset’s cpus first and then migrate processes—this is
>> a typical usage scenario.
>>
> 
> I'm sorry, I didn't quite understand the point you were trying to make with this example.
> 
> If that's the case
> 
>      root cgroup
>           |
>           A1
>        / /  \ \
>      A2 A3  A4 A5
> 
>  #1> echo "0-1" > A1/cpuset.cpus
>  #2> echo "root" > A1/cpuset.cpus.partition
>  #3> echo "0-1" > A2/cpuset.cpus
>  #4> echo "root" > A2/cpuset.cpus.partition
>  mkdir A4
>  mkdir A5
>  echo "0" > A4/cpuset.cpus

If we don't apply your patch, A2 will be invalidated.

>  echo $$ > A4/cgroup.procs  ->This will return an error because A4's effective CPUs are empty.
>  echo "1" > A5/cpuset.cpus
>  echo $$ > A5/cgroup.procs  ->This will return an error because A5's effective CPUs are empty.
> 
> Even with this patch applied, this result will not change.
> 

You can have a try, the result I got:

# mkdir A1
# echo "0-1" > A1/cpuset.cpus
# echo "root" > A1/cpuset.cpus.partition
# cd A1/
# mkdir A2
# mkdir A4
# mkdir A5
# echo "0-1" > A2/cpuset.cpus
# echo "root" > A2/cpuset.cpus.partition
#
# echo "0" > A4/cpuset.cpus
# cat A2/cpuset.cpus
0-1
# cat A2/cpuset.cpus.partition
root invalid
# cat A4/cpuset.cpus.effective
0

>>
>>> Additionally, processes cannot be migrated into a cgroup whose 
>>> cpuset.cpus.effective is empty. However, this patch does not modify this behavior.
>>>
>>> So why does applying this patch enable such migration?
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 2 weeks ago
Hi, Ridong,

On Mon, 24 Nov 2025 19:33:54, Chen Ridong wrote:
>On 2025/11/24 18:20, Sun Shaojie wrote:
>> I'm sorry, I didn't quite understand the point you were trying to make with this example.
>> 
>> If that's the case
>> 
>>      root cgroup
>>           |
>>           A1
>>        / /  \ \
>>      A2 A3  A4 A5
>> 
>>  #1> echo "0-1" > A1/cpuset.cpus
>>  #2> echo "root" > A1/cpuset.cpus.partition
>>  #3> echo "0-1" > A2/cpuset.cpus
>>  #4> echo "root" > A2/cpuset.cpus.partition
>>  mkdir A4
>>  mkdir A5
>>  echo "0" > A4/cpuset.cpus
>
>If we don't apply your patch, A2 will be invalidated.
>
>>  echo $$ > A4/cgroup.procs  ->This will return an error because A4's effective CPUs are empty.
>>  echo "1" > A5/cpuset.cpus
>>  echo $$ > A5/cgroup.procs  ->This will return an error because A5's effective CPUs are empty.
>> 
>> Even with this patch applied, this result will not change.
>> 
>
>You can have a try, the result I got:
>
># mkdir A1
># echo "0-1" > A1/cpuset.cpus
># echo "root" > A1/cpuset.cpus.partition
># cd A1/
># mkdir A2
># mkdir A4
># mkdir A5
># echo "0-1" > A2/cpuset.cpus
># echo "root" > A2/cpuset.cpus.partition
>#
># echo "0" > A4/cpuset.cpus
># cat A2/cpuset.cpus
>0-1
># cat A2/cpuset.cpus.partition
>root invalid
># cat A4/cpuset.cpus.effective
>0

A4's cpuset.cpus.effective is 0 because A2 changed from root to root invalid. 
However, the purpose of this patch is precisely to keep A2 as "root".

Before 'echo "0" > A4/cpuset.cpus', A4 is aware that its cpuset.cpus.effective
is empty and that its parent's cpuset.cpus.effective is also empty. Therefore,
after executing 'echo "0" > A4/cpuset.cpus', A4 should anticipate the 
possibility that it may not be allocated any available CPUs.

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 2 months, 3 weeks ago
On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary. If the cpuset being
> modified is exclusive, it should invalidate itself upon conflict.
> 
> This patch applies only to the following two cases:
> 
> Assume the machine has 4 CPUs (0-3).
> 
>    root cgroup
>       /    \
>     A1      B1
> 
> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
> 
>  Table 1.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
> 
> After step #3, A1 changes from "root" to "root invalid" because its CPUs
> (0-1) overlap with those requested by B1 (0). However, B1 can actually
> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
> remain as "root."
> 
>  Table 1.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
> 
> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus

(Thanks for working this out, Shaojie.)

> 
>  Table 2.1: Before applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
> 
> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
> regardless of what conflicting value B1 writes to cpuset.cpus, it will
> always have at least CPU 2 available. This makes it unnecessary to mark
> A1 as "root invalid".
> 
>  Table 2.2: After applying this patch
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
> 
> In summary, regardless of how B1 configures its cpuset.cpus, there will
> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
> is no need to change A1 from "root" to "root invalid".

Admittedly, I don't like this change because it relies on implicit
preference ordering between siblings (here first comes, first served)
and so the effective config cannot be derived just from the applied
values :-/

Do you actually want to achieve this or is it an implementation
side-effect of the Case 1 scenario that you want to achieve?


Thanks,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 3 weeks ago
Hi, Michal,

On Wed, 19 Nov 2025 14:20:25, Michal Koutný wrote:
>On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with its sibling partition, the sibling's partition state becomes invalid.
>> However, this invalidation is often unnecessary. If the cpuset being
>> modified is exclusive, it should invalidate itself upon conflict.
>> 
>> This patch applies only to the following two cases:
>> 
>> Assume the machine has 4 CPUs (0-3).
>> 
>>    root cgroup
>>       /    \
>>     A1      B1
>> 
>> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>> 
>>  Table 1.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>> 
>> After step #3, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0). However, B1 can actually
>> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
>> remain as "root."
>> 
>>  Table 1.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>> 
>> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
>
>(Thanks for working this out, Shaojie.)
>
>> 
>>  Table 2.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>> 
>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>> always have at least CPU 2 available. This makes it unnecessary to mark
>> A1 as "root invalid".
>> 
>>  Table 2.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>> 
>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>> is no need to change A1 from "root" to "root invalid".
>
>Admittedly, I don't like this change because it relies on implicit
>preference ordering between siblings (here first comes, first served)
>and so the effective config cannot be derived just from the applied
>values :-/
>
>Do you actually want to achieve this or is it an implementation
>side-effect of the Case 1 scenario that you want to achieve?

Yes, this is indeed the functionality I intended to achieve, as I find it 
follows the same logic as Case 1.

However, I didn't fully understand what you meant by "implicit preference 
ordering between siblings (here first comes, first served)."
Could you provide an example?

As for your point that "the effective config cannot be derived just from 
the applied values," even before this patch, we couldn't derive the final 
effective configuration solely from the applied values.

For example, consider the following scenario: (not apply this patch)
Table 1:
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |

Table 2:
 Step                                       | A1's prstate | B1's prstate |
 #1> echo "1-2" > B1/cpuset.cpus            | member       | member       |
 #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member       |
 #3> echo "0-1" > A1/cpuset.cpus            | root         | member       |

After step #3, both Table 1 and Table 2 have identical value settings, 
yet A1's partition state differs between them.


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Michal Koutný 2 months, 2 weeks ago
On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
> >Do you actually want to achieve this or is it an implementation
> >side-effect of the Case 1 scenario that you want to achieve?
> 
> Yes, this is indeed the functionality I intended to achieve, as I find it 
> follows the same logic as Case 1.

So you want to achieve a stable [1] set of CPUs for a cgroup that cannot
be taken away from you by any sibling, correct?
My reasoning is that the siblings should be under one management entity
and therefore such overcommitment should be avoided already in the
configuration. Invalidating all conflicting siblings is then the most
fair result achievable.
B1 is a second-class partition _only_ because it starts later or why is
it OK to not fulfill its requirement?

[1] Note that A1 should still watch its cpuset.cpus.partition if it
takes exclusivity seriously because its cpus may be taken away by
hot(un)plug or ancestry reconfiguration.
	

> As for your point that "the effective config cannot be derived just from 
> the applied values," even before this patch, we couldn't derive the final 
> effective configuration solely from the applied values.
> 
> For example, consider the following scenario: (not apply this patch)
> Table 1:
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
> 
> Table 2:
>  Step                                       | A1's prstate | B1's prstate |
>  #1> echo "1-2" > B1/cpuset.cpus            | member       | member       |
>  #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member       |
>  #3> echo "0-1" > A1/cpuset.cpus            | root         | member       |
> 
> After step #3, both Table 1 and Table 2 have identical value settings, 
> yet A1's partition state differs between them.

Aha, I must admit I didn't expect that. IMO, nothing (documented)
prevents the latter (Table 2) behavior (here I'm referring to
cpuset.cpus, not sure about cpuset.cpus.exclusive).
Which of Table 1 or Table do you prefer?

Thanks,
Michal
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 1 week ago
Hi, Michal

On Wed, 26 Nov 2025 15:13:13, Michal Koutný wrote:
>On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> >Do you actually want to achieve this or is it an implementation
>> >side-effect of the Case 1 scenario that you want to achieve?
>> 
>> Yes, this is indeed the functionality I intended to achieve, as I find it 
>> follows the same logic as Case 1.
>
>So you want to achieve a stable [1] set of CPUs for a cgroup that cannot
>be taken away from you by any sibling, correct?
>My reasoning is that the siblings should be under one management entity
>and therefore such overcommitment should be avoided already in the
>configuration. Invalidating all conflicting siblings is then the most
>fair result achievable.
>B1 is a second-class partition _only_ because it starts later or why is
>it OK to not fulfill its requirement?
>

If the siblings are under a single management entity, that certainly works.
But what if there are multiple administrative users? Should we really
violate other users' requirements just to satisfy one user's requirement?
Given this, first-come-first-served might be fairer.

>[1] Note that A1 should still watch its cpuset.cpus.partition if it
>takes exclusivity seriously because its cpus may be taken away by
>hot(un)plug or ancestry reconfiguration.

Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 2 weeks ago

On 2025/11/26 22:13, Michal Koutný wrote:
> On Thu, Nov 20, 2025 at 09:05:57PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>>> Do you actually want to achieve this or is it an implementation
>>> side-effect of the Case 1 scenario that you want to achieve?
>>
>> Yes, this is indeed the functionality I intended to achieve, as I find it 
>> follows the same logic as Case 1.
> 
> So you want to achieve a stable [1] set of CPUs for a cgroup that cannot
> be taken away from you by any sibling, correct?
> My reasoning is that the siblings should be under one management entity
> and therefore such overcommitment should be avoided already in the
> configuration. Invalidating all conflicting siblings is then the most
> fair result achievable.
> B1 is a second-class partition _only_ because it starts later or why is
> it OK to not fulfill its requirement?
> 
> [1] Note that A1 should still watch its cpuset.cpus.partition if it
> takes exclusivity seriously because its cpus may be taken away by
> hot(un)plug or ancestry reconfiguration.
> 	
> 
>> As for your point that "the effective config cannot be derived just from 
>> the applied values," even before this patch, we couldn't derive the final 
>> effective configuration solely from the applied values.
>>
>> For example, consider the following scenario: (not apply this patch)
>> Table 1:
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
>>
>> Table 2:
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "1-2" > B1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member       |
>>  #3> echo "0-1" > A1/cpuset.cpus            | root         | member       |
>>
>> After step #3, both Table 1 and Table 2 have identical value settings, 
>> yet A1's partition state differs between them.
> 

A corner case should be fixed, and I have sent the patch.

https://lore.kernel.org/cgroups/20251115093140.1121329-1-chenridong@huaweicloud.com/

> Aha, I must admit I didn't expect that. IMO, nothing (documented)
> prevents the latter (Table 2) behavior (here I'm referring to
> cpuset.cpus, not sure about cpuset.cpus.exclusive).
> Which of Table 1 or Table do you prefer?
> 
> Thanks,
> Michal

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/19 21:20, Michal Koutný wrote:
> On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
>> with its sibling partition, the sibling's partition state becomes invalid.
>> However, this invalidation is often unnecessary. If the cpuset being
>> modified is exclusive, it should invalidate itself upon conflict.
>>
>> This patch applies only to the following two cases:
>>
>> Assume the machine has 4 CPUs (0-3).
>>
>>    root cgroup
>>       /    \
>>     A1      B1
>>
>> Case 1: A1 is exclusive, B1 is non-exclusive, set B1's cpuset.cpus
>>
>>  Table 1.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>
>> After step #3, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0). However, B1 can actually
>> use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
>> remain as "root."
>>
>>  Table 1.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "0" > B1/cpuset.cpus              | root         | member       |
>>
>> Case 2: Both A1 and B1 are exclusive, set B1's cpuset.cpus
> 
> (Thanks for working this out, Shaojie.)
> 
>>
>>  Table 2.1: Before applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>
>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>> always have at least CPU 2 available. This makes it unnecessary to mark
>> A1 as "root invalid".
>>
>>  Table 2.2: After applying this patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>
>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>> is no need to change A1 from "root" to "root invalid".
> 
> Admittedly, I don't like this change because it relies on implicit
> preference ordering between siblings (here first comes, first served)

Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we
may end up with different results depending on the order of operations.

> and so the effective config cannot be derived just from the applied
> values :-/
> 
> Do you actually want to achieve this or is it an implementation
> side-effect of the Case 1 scenario that you want to achieve?
> 
> 
> Thanks,
> Michal

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 3 weeks ago
Hi, Ridong,

On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:
>On 2025/11/19 21:20, Michal Koutný wrote:
>> On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>>>  Table 2.1: Before applying this patch
>>>  Step                                       | A1's prstate | B1's prstate |
>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>>
>>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>>> always have at least CPU 2 available. This makes it unnecessary to mark
>>> A1 as "root invalid".
>>>
>>>  Table 2.2: After applying this patch
>>>  Step                                       | A1's prstate | B1's prstate |
>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>>
>>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>>> is no need to change A1 from "root" to "root invalid".
>> 
>> Admittedly, I don't like this change because it relies on implicit
>> preference ordering between siblings (here first comes, first served)
>
>Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we
>may end up with different results depending on the order of operations.


I don't understand the "order of operations" mentioned here. After reviewing
the previous email content, are you referring to this?

On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>With the result you expect, would we observe the following behaviors:
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "1-2"  > B1/cpuset.cpus
>#5> echo "root" > A1/cpuset.cpus.partition
>#6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "1-2"  > B1/cpuset.cpus
>#5> echo "root" > B1/cpuset.cpus.partition
>#6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>
>Do different operation orders yield different results? If so, this is not what we expect.

However, after applying this patch, the outcomes of these two examples are 
as follows:
 
 #1> mkdir -p A1
 #2> mkdir -p B1
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
 #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
 #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
 #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|

 #1> mkdir -p A1
 #2> mkdir -p B1
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
 #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
 #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
 #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|

Moreover, even without applying this patch, the result remains the same,
because modifying cpuset.cpus.partition does not disable its siblings' partitions.

So, what are the specific issues that you believe would arise?


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 3 weeks ago

On 2025/11/20 21:07, Sun Shaojie wrote:
> Hi, Ridong,
> 
> On Thu, 20 Nov 2025 08:57:51, Chen Ridong wrote:
>> On 2025/11/19 21:20, Michal Koutný wrote:
>>> On Wed, Nov 19, 2025 at 06:57:49PM +0800, Sun Shaojie <sunshaojie@kylinos.cn> wrote:
>>>>  Table 2.1: Before applying this patch
>>>>  Step                                       | A1's prstate | B1's prstate |
>>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>>  #5> echo "1-2" > B1/cpuset.cpus            | root invalid | root invalid |
>>>>
>>>> After step #4, B1 can exclusively use CPU 2. Therefore, at step #5,
>>>> regardless of what conflicting value B1 writes to cpuset.cpus, it will
>>>> always have at least CPU 2 available. This makes it unnecessary to mark
>>>> A1 as "root invalid".
>>>>
>>>>  Table 2.2: After applying this patch
>>>>  Step                                       | A1's prstate | B1's prstate |
>>>>  #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>>>>  #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>>>  #3> echo "2" > B1/cpuset.cpus              | root         | member       |
>>>>  #4> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>>>>  #5> echo "1-2" > B1/cpuset.cpus            | root         | root invalid |
>>>>
>>>> In summary, regardless of how B1 configures its cpuset.cpus, there will
>>>> always be available CPUs in B1's cpuset.cpus.effective. Therefore, there
>>>> is no need to change A1 from "root" to "root invalid".
>>>
>>> Admittedly, I don't like this change because it relies on implicit
>>> preference ordering between siblings (here first comes, first served)
>>
>> Agree. If we only invalidate the latter one, I think regardless of the implementation approach, we
>> may end up with different results depending on the order of operations.
> 
> 
> I don't understand the "order of operations" mentioned here. After reviewing
> the previous email content, are you referring to this?
> 
> On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>> With the result you expect, would we observe the following behaviors:
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "1-2"  > B1/cpuset.cpus
>> #5> echo "root" > A1/cpuset.cpus.partition
>> #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "1-2"  > B1/cpuset.cpus
>> #5> echo "root" > B1/cpuset.cpus.partition
>> #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>>
>> Do different operation orders yield different results? If so, this is not what we expect.
> 
> However, after applying this patch, the outcomes of these two examples are 
> as follows:
>  
>  #1> mkdir -p A1
>  #2> mkdir -p B1
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>  #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
> 
>  #1> mkdir -p A1
>  #2> mkdir -p B1
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>  #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
>  #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
> 

How about the following two sequences of operations:

#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "root" > A1/cpuset.cpus.partition
#5> echo "1-2"  > B1/cpuset.cpus
#6> echo "root" > B1/cpuset.cpus.partition


#1> mkdir -p A1
#2> mkdir -p B1
#5> echo "1-2"  > B1/cpuset.cpus
#6> echo "root" > B1/cpuset.cpus.partition
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "root" > A1/cpuset.cpus.partition

Will these two sequences yield the same result?

As a key requirement: Regardless of the order in which we apply the configurations, identical final
settings should always result in identical system states. We need to confirm if this holds true here.

> Moreover, even without applying this patch, the result remains the same,
> because modifying cpuset.cpus.partition does not disable its siblings' partitions.
> 
> So, what are the specific issues that you believe would arise?
> 
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong

Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 2 weeks ago
Hi, Ridong,

Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:
>On 2025/11/20 21:07, Sun Shaojie wrote:
>> I don't understand the "order of operations" mentioned here. After reviewing
>> the previous email content, are you referring to this?
>> 
>> On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>>> With the result you expect, would we observe the following behaviors:
>>>
>>> #1> mkdir -p A1
>>> #2> mkdir -p B1
>>> #3> echo "0-1"  > A1/cpuset.cpus
>>> #4> echo "1-2"  > B1/cpuset.cpus
>>> #5> echo "root" > A1/cpuset.cpus.partition
>>> #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>>>
>>> #1> mkdir -p A1
>>> #2> mkdir -p B1
>>> #3> echo "0-1"  > A1/cpuset.cpus
>>> #4> echo "1-2"  > B1/cpuset.cpus
>>> #5> echo "root" > B1/cpuset.cpus.partition
>>> #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>>>
>>> Do different operation orders yield different results? If so, this is not what we expect.
>> 
>> However, after applying this patch, the outcomes of these two examples are 
>> as follows:
>>  
>>  #1> mkdir -p A1
>>  #2> mkdir -p B1
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>  #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
>>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
>> 
>>  #1> mkdir -p A1
>>  #2> mkdir -p B1
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>  #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
>>  #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
>> 
>
>How about the following two sequences of operations:
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "root" > A1/cpuset.cpus.partition
>#5> echo "1-2"  > B1/cpuset.cpus
>#6> echo "root" > B1/cpuset.cpus.partition
>
>
>#1> mkdir -p A1
>#2> mkdir -p B1
>#5> echo "1-2"  > B1/cpuset.cpus
>#6> echo "root" > B1/cpuset.cpus.partition
>#3> echo "0-1"  > A1/cpuset.cpus
>#4> echo "root" > A1/cpuset.cpus.partition
>
>Will these two sequences yield the same result?

>As a key requirement: Regardless of the order in which we apply the configurations, identical final
>settings should always result in identical system states. We need to confirm if this holds true here.

Is this truly a key requirement? It appears this requirement wasn't met even
before applying my patch.

The example below, which does not use this patch, demonstrates how different
sequences with identical configurations can still lead to different system
states.

 #1> mkdir -p A1
 #2> mkdir -p B1                            | A1's prstate | B1's prstate |
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
 #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
 #5> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #6> echo "1-2"  > B1/cpuset.cpus           | root invalid | member       |
 #7> echo "2-3"  > B1/cpuset.cpus.exclusive | root invalid | member       |
 #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root         |

 #1> mkdir -p A1
 #2> mkdir -p B1                            | A1's prstate | B1's prstate |
 #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
 #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
 #5> echo "2-3"  > B1/cpuset.cpus.exclusive | member       | member       |
 #6> echo "root" > A1/cpuset.cpus.partition | root         | member       |
 #7> echo "1-2"  > B1/cpuset.cpus           | root         | member       |
 #8> echo "root" > B1/cpuset.cpus.partition | root         | root         |

Even without this patch, the result can still differ.


Thanks,
Sun Shaojie
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Chen Ridong 2 months, 2 weeks ago

On 2025/11/21 18:33, Sun Shaojie wrote:
> Hi, Ridong,
> 
> Thu, 20 Nov 2025 21:25:12, Chen Ridong wrote:
>> On 2025/11/20 21:07, Sun Shaojie wrote:
>>> I don't understand the "order of operations" mentioned here. After reviewing
>>> the previous email content, are you referring to this?
>>>
>>> On Sat, 15 Nov 2025 15:41:03, Chen Ridong wrote:
>>>> With the result you expect, would we observe the following behaviors:
>>>>
>>>> #1> mkdir -p A1
>>>> #2> mkdir -p B1
>>>> #3> echo "0-1"  > A1/cpuset.cpus
>>>> #4> echo "1-2"  > B1/cpuset.cpus
>>>> #5> echo "root" > A1/cpuset.cpus.partition
>>>> #6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
>>>>
>>>> #1> mkdir -p A1
>>>> #2> mkdir -p B1
>>>> #3> echo "0-1"  > A1/cpuset.cpus
>>>> #4> echo "1-2"  > B1/cpuset.cpus
>>>> #5> echo "root" > B1/cpuset.cpus.partition
>>>> #6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
>>>>
>>>> Do different operation orders yield different results? If so, this is not what we expect.
>>>
>>> However, after applying this patch, the outcomes of these two examples are 
>>> as follows:
>>>  
>>>  #1> mkdir -p A1
>>>  #2> mkdir -p B1
>>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>>  #5> echo "root" > A1/cpuset.cpus.partition | root invalid | root        |
>>>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid|
>>>
>>>  #1> mkdir -p A1
>>>  #2> mkdir -p B1
>>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member      |
>>>  #4> echo "1-2"  > B1/cpuset.cpus           | member       | member      |
>>>  #5> echo "root" > B1/cpuset.cpus.partition | root         | root invalid|
>>>  #6> echo "root" > A1/cpuset.cpus.partition | root invalid | root invalid|
>>>
>>
>> How about the following two sequences of operations:
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "root" > A1/cpuset.cpus.partition
>> #5> echo "1-2"  > B1/cpuset.cpus
>> #6> echo "root" > B1/cpuset.cpus.partition
>>
>>
>> #1> mkdir -p A1
>> #2> mkdir -p B1
>> #5> echo "1-2"  > B1/cpuset.cpus
>> #6> echo "root" > B1/cpuset.cpus.partition
>> #3> echo "0-1"  > A1/cpuset.cpus
>> #4> echo "root" > A1/cpuset.cpus.partition
>>
>> Will these two sequences yield the same result?
> 
>> As a key requirement: Regardless of the order in which we apply the configurations, identical final
>> settings should always result in identical system states. We need to confirm if this holds true here.
> 
> Is this truly a key requirement? It appears this requirement wasn't met even
> before applying my patch.
> 

I believe it requires, it may some corner cases we should fix.

> The example below, which does not use this patch, demonstrates how different
> sequences with identical configurations can still lead to different system
> states.
> 
>  #1> mkdir -p A1
>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>  #5> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #6> echo "1-2"  > B1/cpuset.cpus           | root invalid | member       |
>  #7> echo "2-3"  > B1/cpuset.cpus.exclusive | root invalid | member       |
>  #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root         |
> 

IIUC, you've created this example with the expectation that both A1 and B1 should serve as root
partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus,
cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.

The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8
is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to
passively and automatically transition A1 to a valid partition state.

>  #1> mkdir -p A1
>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>  #5> echo "2-3"  > B1/cpuset.cpus.exclusive | member       | member       |
>  #6> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>  #7> echo "1-2"  > B1/cpuset.cpus           | root         | member       |
>  #8> echo "root" > B1/cpuset.cpus.partition | root         | root         |
> 
> Even without this patch, the result can still differ.
> 
> 
> Thanks,
> Sun Shaojie

-- 
Best regards,
Ridong
Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on cpuset.cpus conflict.
Posted by Sun Shaojie 2 months, 2 weeks ago
Hi, Ridong,

On Sat, 22 Nov 2025 09:19:39, Chen Ridong wrote:
>On 2025/11/21 18:33, Sun Shaojie wrote:
>> Is this truly a key requirement? It appears this requirement wasn't met even
>> before applying my patch.
>> 
>
>I believe it requires, it may some corner cases we should fix.
>
>> The example below, which does not use this patch, demonstrates how different
>> sequences with identical configurations can still lead to different system
>> states.
>> 
>>  #1> mkdir -p A1
>>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>>  #5> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #6> echo "1-2"  > B1/cpuset.cpus           | root invalid | member       |
>>  #7> echo "2-3"  > B1/cpuset.cpus.exclusive | root invalid | member       |
>>  #8> echo "root" > B1/cpuset.cpus.partition | root invalid | root         |
>> 
>
>IIUC, you've created this example with the expectation that both A1 and B1 should serve as root
>partitions. However, we currently lack a mechanism where modifying a cpuset's state (e.g., cpus,
>cpus.exclusive, or cpus.partition) can transition its sibling from an invalid to a valid partition.
>
>The behavior observed before step #6 is acceptable. Proactively setting B1 as a partition in step #8
>is permitted, given that B1 does not conflict with A1. However, we do not have a mechanism to
>passively and automatically transition A1 to a valid partition state.
>

So, was the original behavior of invalidating sibling partitions driven by this key requirement?
(As a key requirement: Regardless of the order in which we apply the configurations, identical final
settings should always result in identical system states.)

>>  #1> mkdir -p A1
>>  #2> mkdir -p B1                            | A1's prstate | B1's prstate |
>>  #3> echo "0-1"  > A1/cpuset.cpus           | member       | member       |
>>  #4> echo "0-1"  > A1/cpuset.cpus.exclusive | member       | member       |
>>  #5> echo "2-3"  > B1/cpuset.cpus.exclusive | member       | member       |
>>  #6> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>>  #7> echo "1-2"  > B1/cpuset.cpus           | root         | member       |
>>  #8> echo "root" > B1/cpuset.cpus.partition | root         | root         |
>> 
>> Even without this patch, the result can still differ.
>> 

Thanks,
Sun Shaojie