kernel/sched/deadline.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
By properly setting up a 1-cpu sched domain (partition) with no
task, it was found that offlining that particular CPU failed because
dl_bw_check_overflow() in cpuset_cpu_inactive() returned -EBUSY. This
is due to the fact that dl_bw_capacity() return 0 as the sched domain
has no active CPU causing a false positive in the overflow check.
Fix this corner case by skipping the __dl_overflow() check in
dl_bw_manage() when the returned capacity is 0.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/sched/deadline.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index be1b917dc8ce..0195f350d6d3 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3479,7 +3479,13 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
} else {
unsigned long cap = dl_bw_capacity(cpu);
- overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
+ /*
+ * In the unlikely case of 0 capacity (e.g. a sched domain
+ * with no active CPUs), skip the overflow check as it will
+ * always return a false positive.
+ */
+ if (likely(cap))
+ overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
if (req == dl_bw_req_alloc && !overflow) {
/*
--
2.47.0
On 07/11/24 23:29, Waiman Long wrote: > By properly setting up a 1-cpu sched domain (partition) with no > task, it was found that offlining that particular CPU failed because > dl_bw_check_overflow() in cpuset_cpu_inactive() returned -EBUSY. This > is due to the fact that dl_bw_capacity() return 0 as the sched domain > has no active CPU causing a false positive in the overflow check. > > Fix this corner case by skipping the __dl_overflow() check in > dl_bw_manage() when the returned capacity is 0. > > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/sched/deadline.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index be1b917dc8ce..0195f350d6d3 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -3479,7 +3479,13 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) > } else { > unsigned long cap = dl_bw_capacity(cpu); > > - overflow = __dl_overflow(dl_b, cap, 0, dl_bw); > + /* > + * In the unlikely case of 0 capacity (e.g. a sched domain > + * with no active CPUs), skip the overflow check as it will > + * always return a false positive. > + */ > + if (likely(cap)) > + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); The remaining total_bw that make this check fail should be the one relative to the dl_server on the cpu that is going offline. Wonder if we shouldn't rather clean that up (remove dl_server contribution) before we get to this point during an hotplug operation. Need to think about it a little more. Thanks, Juri
On 11/8/24 8:25 AM, Juri Lelli wrote: > On 07/11/24 23:29, Waiman Long wrote: >> By properly setting up a 1-cpu sched domain (partition) with no >> task, it was found that offlining that particular CPU failed because >> dl_bw_check_overflow() in cpuset_cpu_inactive() returned -EBUSY. This >> is due to the fact that dl_bw_capacity() return 0 as the sched domain >> has no active CPU causing a false positive in the overflow check. >> >> Fix this corner case by skipping the __dl_overflow() check in >> dl_bw_manage() when the returned capacity is 0. >> >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> kernel/sched/deadline.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c >> index be1b917dc8ce..0195f350d6d3 100644 >> --- a/kernel/sched/deadline.c >> +++ b/kernel/sched/deadline.c >> @@ -3479,7 +3479,13 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) >> } else { >> unsigned long cap = dl_bw_capacity(cpu); >> >> - overflow = __dl_overflow(dl_b, cap, 0, dl_bw); >> + /* >> + * In the unlikely case of 0 capacity (e.g. a sched domain >> + * with no active CPUs), skip the overflow check as it will >> + * always return a false positive. >> + */ >> + if (likely(cap)) >> + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); > The remaining total_bw that make this check fail should be the one > relative to the dl_server on the cpu that is going offline. Wonder if we > shouldn't rather clean that up (remove dl_server contribution) before we > get to this point during an hotplug operation. Need to think about it a > little more. static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap, u64 old_bw, u64 new_bw) { return dl_b->bw != -1 && cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw; } With a 0 cap, cap_scale(dl_b->bw, cap) will always be 0. As long as total_bw isn't 0 and bw isn't -1, the condition will be true. Cheers, Longman
On 08/11/24 08:39, Waiman Long wrote: > On 11/8/24 8:25 AM, Juri Lelli wrote: > > On 07/11/24 23:29, Waiman Long wrote: > > > By properly setting up a 1-cpu sched domain (partition) with no > > > task, it was found that offlining that particular CPU failed because > > > dl_bw_check_overflow() in cpuset_cpu_inactive() returned -EBUSY. This > > > is due to the fact that dl_bw_capacity() return 0 as the sched domain > > > has no active CPU causing a false positive in the overflow check. > > > > > > Fix this corner case by skipping the __dl_overflow() check in > > > dl_bw_manage() when the returned capacity is 0. > > > > > > Signed-off-by: Waiman Long <longman@redhat.com> > > > --- > > > kernel/sched/deadline.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > > index be1b917dc8ce..0195f350d6d3 100644 > > > --- a/kernel/sched/deadline.c > > > +++ b/kernel/sched/deadline.c > > > @@ -3479,7 +3479,13 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) > > > } else { > > > unsigned long cap = dl_bw_capacity(cpu); > > > - overflow = __dl_overflow(dl_b, cap, 0, dl_bw); > > > + /* > > > + * In the unlikely case of 0 capacity (e.g. a sched domain > > > + * with no active CPUs), skip the overflow check as it will > > > + * always return a false positive. > > > + */ > > > + if (likely(cap)) > > > + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); > > The remaining total_bw that make this check fail should be the one > > relative to the dl_server on the cpu that is going offline. Wonder if we > > shouldn't rather clean that up (remove dl_server contribution) before we > > get to this point during an hotplug operation. Need to think about it a > > little more. > static inline bool > __dl_overflow(struct dl_bw *dl_b, unsigned long cap, u64 old_bw, u64 new_bw) > { > return dl_b->bw != -1 && > cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw; > } > > With a 0 cap, cap_scale(dl_b->bw, cap) will always be 0. As long as total_bw > isn't 0 and bw isn't -1, the condition will be true. Right, but I fear that by hiding the special corner case we would also miss the cases where we have DEADLINE tasks with bandwidth on that single CPU and then ignore it. So, maybe something like the below? --- kernel/sched/deadline.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index be1b917dc8ce..7884e566557c 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -130,11 +130,24 @@ static inline int dl_bw_cpus(int i) if (cpumask_subset(rd->span, cpu_active_mask)) return cpumask_weight(rd->span); + /* + * Hotplug extreme case in which the last remaining online CPU in a + * root domain is going offline. We get here after that cpus has been + * cleared from cpu_active_mask, so the loop below would return 0 + * CPUs, which of course doesn't make much sense. Return at least 1 + * CPU. + */ + + if (cpumask_weight(rd->span) == 1) + return 1; + cpus = 0; for_each_cpu_and(i, rd->span, cpu_active_mask) cpus++; + WARN_ON(!cpus); + return cpus; } --- That said though, I believe I just found an additional issue. With the above the system doesn't crash (it did w/o it), but happily moves DEADLINE tasks out of a domain with a single CPU going offline. Last time I looked at this we were properly checking and failing the hotplug operation, but it was indeed a while ago, so not sure yet what changed. More staring. Oh, so broken, yay. :) Best, Juri
On 11/8/24 11:31 AM, Juri Lelli wrote: > That said though, I believe I just found an additional issue. With the > above the system doesn't crash (it did w/o it), but happily moves > DEADLINE tasks out of a domain with a single CPU going offline. Last > time I looked at this we were properly checking and failing the hotplug > operation, but it was indeed a while ago, so not sure yet what changed. > More staring. > > Oh, so broken, yay. 🙂 Is it the case that the null total_bw bug let the cpu offline succeeds in a 1-cpu partition with a DL task and cause it to crash? I am also OK if you adds the check for the presence of a DL task in the cpu and fails the offline check in this case. Thanks, Longman
On 11/8/24 11:31 AM, Juri Lelli wrote: > On 08/11/24 08:39, Waiman Long wrote: >> On 11/8/24 8:25 AM, Juri Lelli wrote: >>> On 07/11/24 23:29, Waiman Long wrote: >>>> By properly setting up a 1-cpu sched domain (partition) with no >>>> task, it was found that offlining that particular CPU failed because >>>> dl_bw_check_overflow() in cpuset_cpu_inactive() returned -EBUSY. This >>>> is due to the fact that dl_bw_capacity() return 0 as the sched domain >>>> has no active CPU causing a false positive in the overflow check. >>>> >>>> Fix this corner case by skipping the __dl_overflow() check in >>>> dl_bw_manage() when the returned capacity is 0. >>>> >>>> Signed-off-by: Waiman Long <longman@redhat.com> >>>> --- >>>> kernel/sched/deadline.c | 8 +++++++- >>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c >>>> index be1b917dc8ce..0195f350d6d3 100644 >>>> --- a/kernel/sched/deadline.c >>>> +++ b/kernel/sched/deadline.c >>>> @@ -3479,7 +3479,13 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) >>>> } else { >>>> unsigned long cap = dl_bw_capacity(cpu); >>>> - overflow = __dl_overflow(dl_b, cap, 0, dl_bw); >>>> + /* >>>> + * In the unlikely case of 0 capacity (e.g. a sched domain >>>> + * with no active CPUs), skip the overflow check as it will >>>> + * always return a false positive. >>>> + */ >>>> + if (likely(cap)) >>>> + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); >>> The remaining total_bw that make this check fail should be the one >>> relative to the dl_server on the cpu that is going offline. Wonder if we >>> shouldn't rather clean that up (remove dl_server contribution) before we >>> get to this point during an hotplug operation. Need to think about it a >>> little more. >> static inline bool >> __dl_overflow(struct dl_bw *dl_b, unsigned long cap, u64 old_bw, u64 new_bw) >> { >> return dl_b->bw != -1 && >> cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw; >> } >> >> With a 0 cap, cap_scale(dl_b->bw, cap) will always be 0. As long as total_bw >> isn't 0 and bw isn't -1, the condition will be true. > Right, but I fear that by hiding the special corner case we would also > miss the cases where we have DEADLINE tasks with bandwidth on that > single CPU and then ignore it. > > So, maybe something like the below? > > --- > kernel/sched/deadline.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index be1b917dc8ce..7884e566557c 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -130,11 +130,24 @@ static inline int dl_bw_cpus(int i) > if (cpumask_subset(rd->span, cpu_active_mask)) > return cpumask_weight(rd->span); > > + /* > + * Hotplug extreme case in which the last remaining online CPU in a > + * root domain is going offline. We get here after that cpus has been > + * cleared from cpu_active_mask, so the loop below would return 0 > + * CPUs, which of course doesn't make much sense. Return at least 1 > + * CPU. > + */ > + > + if (cpumask_weight(rd->span) == 1) > + return 1; > + > cpus = 0; > > for_each_cpu_and(i, rd->span, cpu_active_mask) > cpus++; > > + WARN_ON(!cpus); > + > return cpus; > } > --- This patch looks good to me. With this patch and my cpuset patches applied, the test_cpuset_prs.sh test passed without any error. You can add the following tags if you send this patch out. Reported-by: Waiman Long <longman@redhat.com> Tested-by: Waiman Long <longman@redhat.com > That said though, I believe I just found an additional issue. With the > above the system doesn't crash (it did w/o it), but happily moves > DEADLINE tasks out of a domain with a single CPU going offline. Last > time I looked at this we were properly checking and failing the hotplug > operation, but it was indeed a while ago, so not sure yet what changed. > More staring. > > Oh, so broken, yay. :) The cpuset code will move tasks in a partition with no effective CPUs to its parent. Cheers, Longman
© 2016 - 2024 Red Hat, Inc.