[PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing

Waiman Long posted 1 patch 1 week, 1 day ago
kernel/cgroup/cpuset.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
[PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Waiman Long 1 week, 1 day ago
With some recent proposed changes [1] in the deadline server code,
it has caused a test failure in test_cpuset_prs.sh when a change
is being made to an isolated partition. This is due to failing
the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
validate_change().

This is actually a false positive as the failed test case involves an
isolated partition with load balancing disabled. The deadline check
is not meaningful in this case and the users should know what they
are doing.

Fix this by doing the cpuset_cpumask_can_shrink() check only when loading
balanced is enabled. Also change its arguments to use effective_cpus
for the current cpuset and user_xcpus() as an approiximation for the
target effective_cpus as the real effective_cpus hasn't been fully
computed yet as this early stage.

As the check isn't comprehensive, there may be false positives or
negatives. We may have to revise the code to do a more thorough check
in the future if this becomes a concern.

[1] https://lore.kernel.org/lkml/82be06c1-6d6d-4651-86c9-bcc828cbcb80@redhat.com/T/#t

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 655396e75b58..21050cacff0d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -581,12 +581,24 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	/*
 	 * We can't shrink if we won't have enough room for SCHED_DEADLINE
-	 * tasks.
+	 * tasks. This check is not done when scheduling is disabled as the
+	 * users should know what they are doing.
+	 *
+	 * For v1, effective_cpus == cpus_allowed & user_xcpus() returns
+	 * cpus_allowed.
+	 *
+	 * For v2, is_cpu_exclusive() & is_sched_load_balance() are true only
+	 * for non-isolated partition root. At this point, the target
+	 * effective_cpus isn't computed yet. user_xcpus() is the best
+	 * approximation.
+	 *
+	 * TBD: May need to precompute the real effective_cpus here in case
+	 * incorrect scheduling of SCHED_DEADLINE tasks in a partition
+	 * becomes an issue.
 	 */
 	ret = -EBUSY;
-	if (is_cpu_exclusive(cur) &&
-	    !cpuset_cpumask_can_shrink(cur->cpus_allowed,
-				       trial->cpus_allowed))
+	if (is_cpu_exclusive(cur) && is_sched_load_balance(cur) &&
+	    !cpuset_cpumask_can_shrink(cur->effective_cpus, user_xcpus(trial)))
 		goto out;
 
 	/*
-- 
2.47.0
Re: [PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Juri Lelli 1 week ago
Hi Waiman,

On 14/11/24 13:19, Waiman Long wrote:
> With some recent proposed changes [1] in the deadline server code,
> it has caused a test failure in test_cpuset_prs.sh when a change
> is being made to an isolated partition. This is due to failing
> the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
> validate_change().

What sort of change is being made to that isolated partition? Which test
is failing from the test_cpuset_prs.sh collection? Asking because I now
see "All tests PASSED" running that locally (with all my 3 patches on
top of cgroup/for-6.13 w/o this last patch from you).

Thanks,
Juri
Re: [PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Waiman Long 1 week ago
On 11/15/24 5:54 AM, Juri Lelli wrote:
> Hi Waiman,
>
> On 14/11/24 13:19, Waiman Long wrote:
>> With some recent proposed changes [1] in the deadline server code,
>> it has caused a test failure in test_cpuset_prs.sh when a change
>> is being made to an isolated partition. This is due to failing
>> the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
>> validate_change().
> What sort of change is being made to that isolated partition? Which test
> is failing from the test_cpuset_prs.sh collection? Asking because I now
> see "All tests PASSED" running that locally (with all my 3 patches on
> top of cgroup/for-6.13 w/o this last patch from you).

The failing test isn't an isolated partition. The actual test failure is

Test TEST_MATRIX[62] failed result check!
C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 
A1:0-4,A2:1-4,A3:2-4 A1:P0,A2:P-2,A3:P-1

In this particular case, cgroup A3 has the following setting before the 
X5 operation.

A1/A2/A3/cpuset.cpus: 2-4
A1/A2/A3/cpuset.cpus.exclusive: 4
A1/A2/A3/cpuset.cpus.effective: 4
A1/A2/A3/cpuset.cpus.exclusive.effective: 4
A1/A2/A3/cpuset.cpus.partition: root

I believe this is fixed by my other change in the commit to change 
arguments of cpuset_cpumask_can_shrink() to use effective_cpus instead 
as cpus_allowed may not represent what CPUs are currently used in the 
partition.

Cheers,
Longman
Re: [PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Juri Lelli 4 days, 11 hours ago
On 15/11/24 12:55, Waiman Long wrote:
> On 11/15/24 5:54 AM, Juri Lelli wrote:
> > Hi Waiman,
> > 
> > On 14/11/24 13:19, Waiman Long wrote:
> > > With some recent proposed changes [1] in the deadline server code,
> > > it has caused a test failure in test_cpuset_prs.sh when a change
> > > is being made to an isolated partition. This is due to failing
> > > the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
> > > validate_change().
> > What sort of change is being made to that isolated partition? Which test
> > is failing from the test_cpuset_prs.sh collection? Asking because I now
> > see "All tests PASSED" running that locally (with all my 3 patches on
> > top of cgroup/for-6.13 w/o this last patch from you).
> 
> The failing test isn't an isolated partition. The actual test failure is
> 
> Test TEST_MATRIX[62] failed result check!
> C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 A1:0-4,A2:1-4,A3:2-4
> A1:P0,A2:P-2,A3:P-1
> 
> In this particular case, cgroup A3 has the following setting before the X5
> operation.
> 
> A1/A2/A3/cpuset.cpus: 2-4
> A1/A2/A3/cpuset.cpus.exclusive: 4
> A1/A2/A3/cpuset.cpus.effective: 4
> A1/A2/A3/cpuset.cpus.exclusive.effective: 4
> A1/A2/A3/cpuset.cpus.partition: root

Right, and is this problematic already?

Then the test, I believe, does

# echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive

and that goes through and makes the setup invalid - root domain reconf
and the following

# cat cgroup/A1/cpuset.cpus.partition
member
# cat cgroup/A1/A2/cpuset.cpus.partition
isolated invalid (Parent is not a partition root)
# cat cgroup/A1/A2/A3/cpuset.cpus.partition
root invalid (Parent is an invalid partition root)

Is this what shouldn't happen?
Re: [PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Waiman Long 4 days, 7 hours ago
On 11/18/24 4:39 AM, Juri Lelli wrote:
> On 15/11/24 12:55, Waiman Long wrote:
>> On 11/15/24 5:54 AM, Juri Lelli wrote:
>>> Hi Waiman,
>>>
>>> On 14/11/24 13:19, Waiman Long wrote:
>>>> With some recent proposed changes [1] in the deadline server code,
>>>> it has caused a test failure in test_cpuset_prs.sh when a change
>>>> is being made to an isolated partition. This is due to failing
>>>> the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
>>>> validate_change().
>>> What sort of change is being made to that isolated partition? Which test
>>> is failing from the test_cpuset_prs.sh collection? Asking because I now
>>> see "All tests PASSED" running that locally (with all my 3 patches on
>>> top of cgroup/for-6.13 w/o this last patch from you).
>> The failing test isn't an isolated partition. The actual test failure is
>>
>> Test TEST_MATRIX[62] failed result check!
>> C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 A1:0-4,A2:1-4,A3:2-4
>> A1:P0,A2:P-2,A3:P-1
>>
>> In this particular case, cgroup A3 has the following setting before the X5
>> operation.
>>
>> A1/A2/A3/cpuset.cpus: 2-4
>> A1/A2/A3/cpuset.cpus.exclusive: 4
>> A1/A2/A3/cpuset.cpus.effective: 4
>> A1/A2/A3/cpuset.cpus.exclusive.effective: 4
>> A1/A2/A3/cpuset.cpus.partition: root
> Right, and is this problematic already?
We allow nested partition setup. So there can be a child partition 
underneath a parent partition. So this is OK.
>
> Then the test, I believe, does
>
> # echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive
>
> and that goes through and makes the setup invalid - root domain reconf
> and the following
>
> # cat cgroup/A1/cpuset.cpus.partition
> member
> # cat cgroup/A1/A2/cpuset.cpus.partition
> isolated invalid (Parent is not a partition root)
> # cat cgroup/A1/A2/A3/cpuset.cpus.partition
> root invalid (Parent is an invalid partition root)
>
> Is this what shouldn't happen?
>
A3 should become invalid because none of the CPUs in 
cpuset.cpus.exclusive can be granted. However A2 should remain a valid 
partition. I will look further into that. Thank for spotting this 
inconsistency.

Cheers,
Longman
Re: [PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Waiman Long 3 days, 18 hours ago
On 11/18/24 8:58 AM, Waiman Long wrote:
>>> The failing test isn't an isolated partition. The actual test 
>>> failure is
>>>
>>> Test TEST_MATRIX[62] failed result check!
>>> C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 
>>> A1:0-4,A2:1-4,A3:2-4
>>> A1:P0,A2:P-2,A3:P-1
>>>
>>> In this particular case, cgroup A3 has the following setting before 
>>> the X5
>>> operation.
>>>
>>> A1/A2/A3/cpuset.cpus: 2-4
>>> A1/A2/A3/cpuset.cpus.exclusive: 4
>>> A1/A2/A3/cpuset.cpus.effective: 4
>>> A1/A2/A3/cpuset.cpus.exclusive.effective: 4
>>> A1/A2/A3/cpuset.cpus.partition: root
>> Right, and is this problematic already?
> We allow nested partition setup. So there can be a child partition 
> underneath a parent partition. So this is OK.
>>
>> Then the test, I believe, does
>>
>> # echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive
>>
>> and that goes through and makes the setup invalid - root domain reconf
>> and the following
>>
>> # cat cgroup/A1/cpuset.cpus.partition
>> member
>> # cat cgroup/A1/A2/cpuset.cpus.partition
>> isolated invalid (Parent is not a partition root)
>> # cat cgroup/A1/A2/A3/cpuset.cpus.partition
>> root invalid (Parent is an invalid partition root)
>>
>> Is this what shouldn't happen?
>>
> A3 should become invalid because none of the CPUs in 
> cpuset.cpus.exclusive can be granted. However A2 should remain a valid 
> partition. I will look further into that. Thank for spotting this 
> inconsistency. 

Sorry, I misread the test. The X5 entry above refers to "echo 5 > 
A1/A2/cpuset.cpus.exclusive" not to A3. This invalidates the A2 
partition which further invalidates the child A3 partition. So the 
result is correct.

Cheers,
Longman
Re: [PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Juri Lelli 3 days, 15 hours ago
On 18/11/24 22:28, Waiman Long wrote:
> On 11/18/24 8:58 AM, Waiman Long wrote:
> > > > The failing test isn't an isolated partition. The actual test
> > > > failure is
> > > > 
> > > > Test TEST_MATRIX[62] failed result check!
> > > > C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0
> > > > A1:0-4,A2:1-4,A3:2-4
> > > > A1:P0,A2:P-2,A3:P-1
> > > > 
> > > > In this particular case, cgroup A3 has the following setting
> > > > before the X5
> > > > operation.
> > > > 
> > > > A1/A2/A3/cpuset.cpus: 2-4
> > > > A1/A2/A3/cpuset.cpus.exclusive: 4
> > > > A1/A2/A3/cpuset.cpus.effective: 4
> > > > A1/A2/A3/cpuset.cpus.exclusive.effective: 4
> > > > A1/A2/A3/cpuset.cpus.partition: root
> > > Right, and is this problematic already?
> > We allow nested partition setup. So there can be a child partition
> > underneath a parent partition. So this is OK.
> > > 
> > > Then the test, I believe, does
> > > 
> > > # echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive
> > > 
> > > and that goes through and makes the setup invalid - root domain reconf
> > > and the following
> > > 
> > > # cat cgroup/A1/cpuset.cpus.partition
> > > member
> > > # cat cgroup/A1/A2/cpuset.cpus.partition
> > > isolated invalid (Parent is not a partition root)
> > > # cat cgroup/A1/A2/A3/cpuset.cpus.partition
> > > root invalid (Parent is an invalid partition root)
> > > 
> > > Is this what shouldn't happen?
> > > 
> > A3 should become invalid because none of the CPUs in
> > cpuset.cpus.exclusive can be granted. However A2 should remain a valid
> > partition. I will look further into that. Thank for spotting this
> > inconsistency.
> 
> Sorry, I misread the test. The X5 entry above refers to "echo 5 >
> A1/A2/cpuset.cpus.exclusive" not to A3. This invalidates the A2 partition
> which further invalidates the child A3 partition. So the result is correct.

OK, makes sense to me. But so, the test doesn't actually fail? Sorry,
guess I'm confused. :)
Re: [PATCH] cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
Posted by Tejun Heo 1 week, 1 day ago
On Thu, Nov 14, 2024 at 01:19:15PM -0500, Waiman Long wrote:
> With some recent proposed changes [1] in the deadline server code,
> it has caused a test failure in test_cpuset_prs.sh when a change
> is being made to an isolated partition. This is due to failing
> the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
> validate_change().
> 
> This is actually a false positive as the failed test case involves an
> isolated partition with load balancing disabled. The deadline check
> is not meaningful in this case and the users should know what they
> are doing.
> 
> Fix this by doing the cpuset_cpumask_can_shrink() check only when loading
> balanced is enabled. Also change its arguments to use effective_cpus
> for the current cpuset and user_xcpus() as an approiximation for the
> target effective_cpus as the real effective_cpus hasn't been fully
> computed yet as this early stage.
> 
> As the check isn't comprehensive, there may be false positives or
> negatives. We may have to revise the code to do a more thorough check
> in the future if this becomes a concern.
> 
> [1] https://lore.kernel.org/lkml/82be06c1-6d6d-4651-86c9-bcc828cbcb80@redhat.com/T/#t
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Applied to cgroup/for-6.13.

Thanks.

-- 
tejun