kernel/cgroup/cpuset.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
With some recent proposed changes [1] in the deadline server code,
it has caused a test failure in test_cpuset_prs.sh when a change
is being made to an isolated partition. This is due to failing
the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
validate_change().
This is actually a false positive as the failed test case involves an
isolated partition with load balancing disabled. The deadline check
is not meaningful in this case and the users should know what they
are doing.
Fix this by doing the cpuset_cpumask_can_shrink() check only when loading
balanced is enabled. Also change its arguments to use effective_cpus
for the current cpuset and user_xcpus() as an approiximation for the
target effective_cpus as the real effective_cpus hasn't been fully
computed yet as this early stage.
As the check isn't comprehensive, there may be false positives or
negatives. We may have to revise the code to do a more thorough check
in the future if this becomes a concern.
[1] https://lore.kernel.org/lkml/82be06c1-6d6d-4651-86c9-bcc828cbcb80@redhat.com/T/#t
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 655396e75b58..21050cacff0d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -581,12 +581,24 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
/*
* We can't shrink if we won't have enough room for SCHED_DEADLINE
- * tasks.
+ * tasks. This check is not done when scheduling is disabled as the
+ * users should know what they are doing.
+ *
+ * For v1, effective_cpus == cpus_allowed & user_xcpus() returns
+ * cpus_allowed.
+ *
+ * For v2, is_cpu_exclusive() & is_sched_load_balance() are true only
+ * for non-isolated partition root. At this point, the target
+ * effective_cpus isn't computed yet. user_xcpus() is the best
+ * approximation.
+ *
+ * TBD: May need to precompute the real effective_cpus here in case
+ * incorrect scheduling of SCHED_DEADLINE tasks in a partition
+ * becomes an issue.
*/
ret = -EBUSY;
- if (is_cpu_exclusive(cur) &&
- !cpuset_cpumask_can_shrink(cur->cpus_allowed,
- trial->cpus_allowed))
+ if (is_cpu_exclusive(cur) && is_sched_load_balance(cur) &&
+ !cpuset_cpumask_can_shrink(cur->effective_cpus, user_xcpus(trial)))
goto out;
/*
--
2.47.0
Hi Waiman, On 14/11/24 13:19, Waiman Long wrote: > With some recent proposed changes [1] in the deadline server code, > it has caused a test failure in test_cpuset_prs.sh when a change > is being made to an isolated partition. This is due to failing > the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at > validate_change(). What sort of change is being made to that isolated partition? Which test is failing from the test_cpuset_prs.sh collection? Asking because I now see "All tests PASSED" running that locally (with all my 3 patches on top of cgroup/for-6.13 w/o this last patch from you). Thanks, Juri
On 11/15/24 5:54 AM, Juri Lelli wrote: > Hi Waiman, > > On 14/11/24 13:19, Waiman Long wrote: >> With some recent proposed changes [1] in the deadline server code, >> it has caused a test failure in test_cpuset_prs.sh when a change >> is being made to an isolated partition. This is due to failing >> the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at >> validate_change(). > What sort of change is being made to that isolated partition? Which test > is failing from the test_cpuset_prs.sh collection? Asking because I now > see "All tests PASSED" running that locally (with all my 3 patches on > top of cgroup/for-6.13 w/o this last patch from you). The failing test isn't an isolated partition. The actual test failure is Test TEST_MATRIX[62] failed result check! C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 A1:0-4,A2:1-4,A3:2-4 A1:P0,A2:P-2,A3:P-1 In this particular case, cgroup A3 has the following setting before the X5 operation. A1/A2/A3/cpuset.cpus: 2-4 A1/A2/A3/cpuset.cpus.exclusive: 4 A1/A2/A3/cpuset.cpus.effective: 4 A1/A2/A3/cpuset.cpus.exclusive.effective: 4 A1/A2/A3/cpuset.cpus.partition: root I believe this is fixed by my other change in the commit to change arguments of cpuset_cpumask_can_shrink() to use effective_cpus instead as cpus_allowed may not represent what CPUs are currently used in the partition. Cheers, Longman
On 15/11/24 12:55, Waiman Long wrote: > On 11/15/24 5:54 AM, Juri Lelli wrote: > > Hi Waiman, > > > > On 14/11/24 13:19, Waiman Long wrote: > > > With some recent proposed changes [1] in the deadline server code, > > > it has caused a test failure in test_cpuset_prs.sh when a change > > > is being made to an isolated partition. This is due to failing > > > the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at > > > validate_change(). > > What sort of change is being made to that isolated partition? Which test > > is failing from the test_cpuset_prs.sh collection? Asking because I now > > see "All tests PASSED" running that locally (with all my 3 patches on > > top of cgroup/for-6.13 w/o this last patch from you). > > The failing test isn't an isolated partition. The actual test failure is > > Test TEST_MATRIX[62] failed result check! > C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 A1:0-4,A2:1-4,A3:2-4 > A1:P0,A2:P-2,A3:P-1 > > In this particular case, cgroup A3 has the following setting before the X5 > operation. > > A1/A2/A3/cpuset.cpus: 2-4 > A1/A2/A3/cpuset.cpus.exclusive: 4 > A1/A2/A3/cpuset.cpus.effective: 4 > A1/A2/A3/cpuset.cpus.exclusive.effective: 4 > A1/A2/A3/cpuset.cpus.partition: root Right, and is this problematic already? Then the test, I believe, does # echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive and that goes through and makes the setup invalid - root domain reconf and the following # cat cgroup/A1/cpuset.cpus.partition member # cat cgroup/A1/A2/cpuset.cpus.partition isolated invalid (Parent is not a partition root) # cat cgroup/A1/A2/A3/cpuset.cpus.partition root invalid (Parent is an invalid partition root) Is this what shouldn't happen?
On 11/18/24 4:39 AM, Juri Lelli wrote: > On 15/11/24 12:55, Waiman Long wrote: >> On 11/15/24 5:54 AM, Juri Lelli wrote: >>> Hi Waiman, >>> >>> On 14/11/24 13:19, Waiman Long wrote: >>>> With some recent proposed changes [1] in the deadline server code, >>>> it has caused a test failure in test_cpuset_prs.sh when a change >>>> is being made to an isolated partition. This is due to failing >>>> the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at >>>> validate_change(). >>> What sort of change is being made to that isolated partition? Which test >>> is failing from the test_cpuset_prs.sh collection? Asking because I now >>> see "All tests PASSED" running that locally (with all my 3 patches on >>> top of cgroup/for-6.13 w/o this last patch from you). >> The failing test isn't an isolated partition. The actual test failure is >> >> Test TEST_MATRIX[62] failed result check! >> C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 A1:0-4,A2:1-4,A3:2-4 >> A1:P0,A2:P-2,A3:P-1 >> >> In this particular case, cgroup A3 has the following setting before the X5 >> operation. >> >> A1/A2/A3/cpuset.cpus: 2-4 >> A1/A2/A3/cpuset.cpus.exclusive: 4 >> A1/A2/A3/cpuset.cpus.effective: 4 >> A1/A2/A3/cpuset.cpus.exclusive.effective: 4 >> A1/A2/A3/cpuset.cpus.partition: root > Right, and is this problematic already? We allow nested partition setup. So there can be a child partition underneath a parent partition. So this is OK. > > Then the test, I believe, does > > # echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive > > and that goes through and makes the setup invalid - root domain reconf > and the following > > # cat cgroup/A1/cpuset.cpus.partition > member > # cat cgroup/A1/A2/cpuset.cpus.partition > isolated invalid (Parent is not a partition root) > # cat cgroup/A1/A2/A3/cpuset.cpus.partition > root invalid (Parent is an invalid partition root) > > Is this what shouldn't happen? > A3 should become invalid because none of the CPUs in cpuset.cpus.exclusive can be granted. However A2 should remain a valid partition. I will look further into that. Thank for spotting this inconsistency. Cheers, Longman
On 11/18/24 8:58 AM, Waiman Long wrote: >>> The failing test isn't an isolated partition. The actual test >>> failure is >>> >>> Test TEST_MATRIX[62] failed result check! >>> C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 >>> A1:0-4,A2:1-4,A3:2-4 >>> A1:P0,A2:P-2,A3:P-1 >>> >>> In this particular case, cgroup A3 has the following setting before >>> the X5 >>> operation. >>> >>> A1/A2/A3/cpuset.cpus: 2-4 >>> A1/A2/A3/cpuset.cpus.exclusive: 4 >>> A1/A2/A3/cpuset.cpus.effective: 4 >>> A1/A2/A3/cpuset.cpus.exclusive.effective: 4 >>> A1/A2/A3/cpuset.cpus.partition: root >> Right, and is this problematic already? > We allow nested partition setup. So there can be a child partition > underneath a parent partition. So this is OK. >> >> Then the test, I believe, does >> >> # echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive >> >> and that goes through and makes the setup invalid - root domain reconf >> and the following >> >> # cat cgroup/A1/cpuset.cpus.partition >> member >> # cat cgroup/A1/A2/cpuset.cpus.partition >> isolated invalid (Parent is not a partition root) >> # cat cgroup/A1/A2/A3/cpuset.cpus.partition >> root invalid (Parent is an invalid partition root) >> >> Is this what shouldn't happen? >> > A3 should become invalid because none of the CPUs in > cpuset.cpus.exclusive can be granted. However A2 should remain a valid > partition. I will look further into that. Thank for spotting this > inconsistency. Sorry, I misread the test. The X5 entry above refers to "echo 5 > A1/A2/cpuset.cpus.exclusive" not to A3. This invalidates the A2 partition which further invalidates the child A3 partition. So the result is correct. Cheers, Longman
On 18/11/24 22:28, Waiman Long wrote: > On 11/18/24 8:58 AM, Waiman Long wrote: > > > > The failing test isn't an isolated partition. The actual test > > > > failure is > > > > > > > > Test TEST_MATRIX[62] failed result check! > > > > C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 . . X5 . . 0 > > > > A1:0-4,A2:1-4,A3:2-4 > > > > A1:P0,A2:P-2,A3:P-1 > > > > > > > > In this particular case, cgroup A3 has the following setting > > > > before the X5 > > > > operation. > > > > > > > > A1/A2/A3/cpuset.cpus: 2-4 > > > > A1/A2/A3/cpuset.cpus.exclusive: 4 > > > > A1/A2/A3/cpuset.cpus.effective: 4 > > > > A1/A2/A3/cpuset.cpus.exclusive.effective: 4 > > > > A1/A2/A3/cpuset.cpus.partition: root > > > Right, and is this problematic already? > > We allow nested partition setup. So there can be a child partition > > underneath a parent partition. So this is OK. > > > > > > Then the test, I believe, does > > > > > > # echo 5 >cgroup/A1/A2/cpuset.cpus.exclusive > > > > > > and that goes through and makes the setup invalid - root domain reconf > > > and the following > > > > > > # cat cgroup/A1/cpuset.cpus.partition > > > member > > > # cat cgroup/A1/A2/cpuset.cpus.partition > > > isolated invalid (Parent is not a partition root) > > > # cat cgroup/A1/A2/A3/cpuset.cpus.partition > > > root invalid (Parent is an invalid partition root) > > > > > > Is this what shouldn't happen? > > > > > A3 should become invalid because none of the CPUs in > > cpuset.cpus.exclusive can be granted. However A2 should remain a valid > > partition. I will look further into that. Thank for spotting this > > inconsistency. > > Sorry, I misread the test. The X5 entry above refers to "echo 5 > > A1/A2/cpuset.cpus.exclusive" not to A3. This invalidates the A2 partition > which further invalidates the child A3 partition. So the result is correct. OK, makes sense to me. But so, the test doesn't actually fail? Sorry, guess I'm confused. :)
On Thu, Nov 14, 2024 at 01:19:15PM -0500, Waiman Long wrote: > With some recent proposed changes [1] in the deadline server code, > it has caused a test failure in test_cpuset_prs.sh when a change > is being made to an isolated partition. This is due to failing > the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at > validate_change(). > > This is actually a false positive as the failed test case involves an > isolated partition with load balancing disabled. The deadline check > is not meaningful in this case and the users should know what they > are doing. > > Fix this by doing the cpuset_cpumask_can_shrink() check only when loading > balanced is enabled. Also change its arguments to use effective_cpus > for the current cpuset and user_xcpus() as an approiximation for the > target effective_cpus as the real effective_cpus hasn't been fully > computed yet as this early stage. > > As the check isn't comprehensive, there may be false positives or > negatives. We may have to revise the code to do a more thorough check > in the future if this becomes a concern. > > [1] https://lore.kernel.org/lkml/82be06c1-6d6d-4651-86c9-bcc828cbcb80@redhat.com/T/#t > > Signed-off-by: Waiman Long <longman@redhat.com> Applied to cgroup/for-6.13. Thanks. -- tejun
© 2016 - 2024 Red Hat, Inc.