[pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used

Dario Faggioli posted 1 patch 3 years, 3 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/162801221667.955.3439735419862661383.stgit@Wayrath
There is a newer version of this series
xen/common/sched/credit2.c |   85 +++++++++++++++++++++++++-------------------
1 file changed, 49 insertions(+), 36 deletions(-)
[pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used
Posted by Dario Faggioli 3 years, 3 months ago
Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.

In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have budget to be executed, we
should continue looking instead than giving up and picking the idle
unit.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Suggested-by: George Dunlap <george.dunlap@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
This is necessary to completely fix the bug that was described in and
addressed by 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit
from the runq if there is one").

It should, therefore, be backported and applied to all the branches to
which that commit has been. About backports, it should be
straigthforward to do that until 4.13.

For 4.12 and earlier, it's trickier, but the fix is still necessary.
Actually, both 07b0eb5d0ef0 and this patch should be backported to that
branch!

I will provide the backports myself in a email that I'll send as a
reply to this one.

Regards,
Dario
---
 xen/common/sched/credit2.c |   85 +++++++++++++++++++++++++-------------------
 1 file changed, 49 insertions(+), 36 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index ebb09ea43a..f9b95db313 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
                         (unsigned char *)&d);
         }
 
-        /* Skip non runnable units that we (temporarily) have in the runq */
-        if ( unlikely(!unit_runnable_state(svc->unit)) )
-            continue;
-
-        /* Only consider vcpus that are allowed to run on this processor. */
-        if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
-            continue;
-
         /*
-         * If an unit is meant to be picked up by another processor, and such
-         * processor has not scheduled yet, leave it in the runqueue for him.
+         * If the unit in the runqueue has more credit than current (or than
+         * idle, if current is not runnable) or if current is yielding, we may
+         * want to pick it up.
          */
-        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
-             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+        if ( (yield || svc->credit > snext->credit) )
         {
-            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
-            continue;
-        }
+            /* Skip non runnable units that we (temporarily) have in the runq */
+            if ( unlikely(!unit_runnable_state(svc->unit)) )
+                continue;
 
-        /*
-         * If this is on a different processor, don't pull it unless
-         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
-         */
-        if ( sched_unit_master(svc->unit) != cpu
-             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
-        {
-            SCHED_STAT_CRANK(migrate_resisted);
-            continue;
-        }
+            /* Only consider vcpus that are allowed to run on this processor. */
+            if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
+                continue;
 
-        /*
-         * If the one in the runqueue has more credit than current (or idle,
-         * if current is not runnable), or if current is yielding, and also
-         * if the one in runqueue either is not capped, or is capped but has
-         * some budget, then choose it.
-         */
-        if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) )
-            snext = svc;
+            /*
+             * If an unit is meant to be picked up by another processor, and such
+             * processor has not scheduled yet, leave it in the runqueue for him.
+             */
+            if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
+                 cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
+            {
+                SCHED_STAT_CRANK(deferred_to_tickled_cpu);
+                continue;
+            }
 
-        /* In any case, if we got this far, break. */
-        break;
+            /*
+             * If this is on a different processor, don't pull it unless
+             * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+             */
+            if ( sched_unit_master(svc->unit) != cpu
+                 && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+            {
+                SCHED_STAT_CRANK(migrate_resisted);
+                continue;
+            }
+
+            /*
+             * If we are here, we are almost sure we want to pick the unit in
+             * the runqueue. Last thing we need to check is that it either is
+             * is not capped, or if it is, it has some budget.
+             *
+             * Note that cap & budget should really be the last thing we do
+             * check. In fact, unit_grab_budget() will reserve some budget for
+             * this unit, from the per-domain budget pool, and we want to do
+             * that only if we are sure that we'll then run the unit, consume
+             * some of it, and return the leftover (if any) in the usual way.
+             */
+            if ( has_cap(svc) && !unit_grab_budget(svc) )
+                continue;
+
+            /* If we got this far, we are done. */
+            snext = svc;
+            break;
+        }
     }
 
     if ( unlikely(tb_init_done) )



Re: [PATCH] Xen: credit2: avoid picking a spurious idle unit when caps are used
Posted by Jan Beulich 3 years, 3 months ago
On 03.08.2021 19:36, Dario Faggioli wrote:
> Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
> runq if there is one") did not fix completely the problem of potentially
> selecting a scheduling unit that will then not be able to run.
> 
> In fact, in case caps are used and the unit we are currently looking
> at, during the runqueue scan, does not have budget to be executed, we
> should continue looking instead than giving up and picking the idle
> unit.
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> Suggested-by: George Dunlap <george.dunlap@citrix.com>

Minor remark: Generally I think the order of tags should follow the
timeline: Suggestions (or bug reports) come before patch creation,
which in turns comes before reviewing / acking of a patch.

> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>

Since George is on leave and since I was Cc-ed, I thought I'd make an
attempt at reviewing this. The more that ...

> ---
> This is necessary to completely fix the bug that was described in and
> addressed by 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit
> from the runq if there is one").
> 
> It should, therefore, be backported and applied to all the branches to
> which that commit has been. About backports, it should be
> straigthforward to do that until 4.13.

... for 4.13.4 it would of course be nice to have it in. Things look
plausible overall, but I've got one question which - despite concerning
code you only move - may play into the underlying issue.

> For 4.12 and earlier, it's trickier, but the fix is still necessary.
> Actually, both 07b0eb5d0ef0 and this patch should be backported to that
> branch!

Depends on what you target with this remark: For downstreams - yes. The
stable upstream branch, otoh, is out of general support, and since this
is not a security fix it is not going to be applied to that tree.

> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>                          (unsigned char *)&d);
>          }
>  
> -        /* Skip non runnable units that we (temporarily) have in the runq */
> -        if ( unlikely(!unit_runnable_state(svc->unit)) )
> -            continue;
> -
> -        /* Only consider vcpus that are allowed to run on this processor. */
> -        if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
> -            continue;
> -
>          /*
> -         * If an unit is meant to be picked up by another processor, and such
> -         * processor has not scheduled yet, leave it in the runqueue for him.
> +         * If the unit in the runqueue has more credit than current (or than
> +         * idle, if current is not runnable) or if current is yielding, we may
> +         * want to pick it up.
>           */
> -        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> -             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> +        if ( (yield || svc->credit > snext->credit) )

The "credit" field is plain "int", i.e. signed. Idle domain's vCPU-s
don't get INT_MIN credit afaict (there's only one use of INT_MIN
throughout the entire file). Hence I can't see why in principle a
vCPU of an ordinary domain couldn't have equal or less credit than
the CPU's idle vCPU. I therefore wonder whether "yield" shouldn't be
set to true whenever snext != scurr (or - see the top of the
function - is_idle_unit() returns true), to make sure this if()'s
body gets entered in such a case.

As a nit, there's no need for the inner parentheses (anymore).

>          {
> -            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> -            continue;
> -        }
> +            /* Skip non runnable units that we (temporarily) have in the runq */
> +            if ( unlikely(!unit_runnable_state(svc->unit)) )
> +                continue;
>  
> -        /*
> -         * If this is on a different processor, don't pull it unless
> -         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> -         */
> -        if ( sched_unit_master(svc->unit) != cpu
> -             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
> -        {
> -            SCHED_STAT_CRANK(migrate_resisted);
> -            continue;
> -        }
> +            /* Only consider vcpus that are allowed to run on this processor. */
> +            if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
> +                continue;
>  
> -        /*
> -         * If the one in the runqueue has more credit than current (or idle,
> -         * if current is not runnable), or if current is yielding, and also
> -         * if the one in runqueue either is not capped, or is capped but has
> -         * some budget, then choose it.
> -         */
> -        if ( (yield || svc->credit > snext->credit) &&
> -             (!has_cap(svc) || unit_grab_budget(svc)) )
> -            snext = svc;
> +            /*
> +             * If an unit is meant to be picked up by another processor, and such

Nit: As you move/re-indent (and hence touch) this, would you mind also
replacing "an" by "a"? I'm less certain about "such" and ...

> +             * processor has not scheduled yet, leave it in the runqueue for him.

... "him", but to me "that" and "it" respectively would seem more
suitable.

> +             */
> +            if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> +                 cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> +            {
> +                SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> +                continue;
> +            }
>  
> -        /* In any case, if we got this far, break. */
> -        break;
> +            /*
> +             * If this is on a different processor, don't pull it unless
> +             * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> +             */
> +            if ( sched_unit_master(svc->unit) != cpu
> +                 && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )

Again, despite the code just getting moved/re-indented, please correct
style here ("&&" to be moved to the end of the earlier line) as you
touch the code.

Otoh I'm having trouble seeing why all of this code movement / re-
indentation is necessary in the first place: If the initial if() was
inverted to

        if ( !yield && svc->credit <= snext->credit )
            continue;

less code churn would result afaict, and then the backports likely also
would become less involved (plus my stylistic remarks might evaporate,
as the affected code may then remain untouched altogether).

Jan


Re: [PATCH] Xen: credit2: avoid picking a spurious idle unit when caps are used
Posted by Dario Faggioli 3 years, 3 months ago
On Wed, 2021-08-04 at 09:37 +0200, Jan Beulich wrote:
> On 03.08.2021 19:36, Dario Faggioli wrote:
> > 
> > Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> > Suggested-by: George Dunlap <george.dunlap@citrix.com>
> 
> Minor remark: Generally I think the order of tags should follow the
> timeline: Suggestions (or bug reports) come before patch creation,
> which in turns comes before reviewing / acking of a patch.
> 
Right. In fact, I agree, and I keep forgetting doing that.

Thanks, will fix.

> > Cc: George Dunlap <george.dunlap@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> 
> Since George is on leave and since I was Cc-ed, I thought I'd make an
> attempt at reviewing this. The more that ...
> 
Yep. You were Cc-ed because of the request to backport and include in
stable branches, but thank you very much for also taking the time to
have a look at it!! :-)

> > It should, therefore, be backported and applied to all the branches
> > to
> > which that commit has been. About backports, it should be
> > straigthforward to do that until 4.13.
> 
> ... for 4.13.4 it would of course be nice to have it in. Things look
> plausible overall, but I've got one question which - despite concerning
> code you only move - may play into the underlying issue.
> 
Ok.

> > For 4.12 and earlier, it's trickier, but the fix is still necessary.
> > Actually, both 07b0eb5d0ef0 and this patch should be backported to
> > that
> > branch!
> 
> Depends on what you target with this remark: For downstreams - yes. The
> stable upstream branch, otoh, is out of general support, and since this
> is not a security fix it is not going to be applied to that tree.
> 
Yeah, I know. I decided to mention this (although, I probably could
have made myself more clear) and provide a backport (of this and of the
other, already committed patch) just for convenience of both users and
downstreams that happens to use such codebases.

> >          /*
> > -         * If an unit is meant to be picked up by another processor,
> > and such
> > -         * processor has not scheduled yet, leave it in the runqueue
> > for him.
> > +         * If the unit in the runqueue has more credit than current
> > (or than
> > +         * idle, if current is not runnable) or if current is
> > yielding, we may
> > +         * want to pick it up.
> >           */
> > -        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> > -             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> > +        if ( (yield || svc->credit > snext->credit) )
> 
> The "credit" field is plain "int", i.e. signed. Idle domain's vCPU-s
> don't get INT_MIN credit afaict (there's only one use of INT_MIN
> throughout the entire file). Hence I can't see why in principle a
> vCPU of an ordinary domain couldn't have equal or less credit than
> the CPU's idle vCPU. 
>
So, if I understand what you mean, yes, we've had that issue, i.e.,
vCPUs managing to get to credit values which were lower than the one of
the idle vCPUs.

That was, in fact, even causing issue and it's what lead to
36f3662f27dec32d76c0edb4c6b62b9628d6869d "credit2: avoid vCPUs to ever
reach lower credits than idle".

After that commit, idle vCPUs' credits are set to CSCHED2_CREDIT_MIN-1
and, for regular vCPUs, whenever we subtract some value from their
credits, we limit them to not go beyond CSCHED2_CREDIT_MIN (this
happens in t2c_update(), called by burn_credit()).

Therefore, it should now not be possible any longer for regular vCPUs
to fall behind idle vCPUs, in terms of amount of credits.

So, was it this you were asking about and, if yes, does this answer
your concerns?

> Otoh I'm having trouble seeing why all of this code movement / re-
> indentation is necessary in the first place: If the initial if() was
> inverted to
> 
>         if ( !yield && svc->credit <= snext->credit )
>             continue;
> 
Actually, I am just realizing that if I, instead, use:

        if ( !yield && svc->credit <= snext->credit )
            break;

It would be much better (even as compared to the current situation).

In fact, right now that the priority check is toward the end, we have
do the checks that comes earlier in the loop (is it runnable? Can it
run on this CPU? Is it worth migrating it? Etc) at least for one
element of the runqueue.

With either my code or above the suggested form, we don't, but we may
have to do at least the priority check for all the elements of the
runqueue. This was something I knew, and as a matter of fact, it should
be quick enough (and comparable with doing expensive checks even on
just 1 vCPU). But still, it's ugly.

However, we know that the runqueue is sorted by credits! So, unless
we're yielding, it is always the case that as soon as we find there an
unit that has less credit than snext, we want to bail (and keep running
snext).

This means that we neither scan all the runqueues, not even for doing
just quick priority checks, nor we, in the case that snext is the
actual highest priority unit need to do any check for the unit at the
top of the runqueue.

So, I'm actually re-doing (and re-testing) the patch in this way.

Thanks again and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Re: [PATCH] Xen: credit2: avoid picking a spurious idle unit when caps are used
Posted by Jan Beulich 3 years, 3 months ago
On 04.08.2021 15:28, Dario Faggioli wrote:
> On Wed, 2021-08-04 at 09:37 +0200, Jan Beulich wrote:
>> On 03.08.2021 19:36, Dario Faggioli wrote:
>>>
>>> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
>>> Suggested-by: George Dunlap <george.dunlap@citrix.com>
>>
>> Minor remark: Generally I think the order of tags should follow the
>> timeline: Suggestions (or bug reports) come before patch creation,
>> which in turns comes before reviewing / acking of a patch.
>>
> Right. In fact, I agree, and I keep forgetting doing that.
> 
> Thanks, will fix.
> 
>>> Cc: George Dunlap <george.dunlap@citrix.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>
>> Since George is on leave and since I was Cc-ed, I thought I'd make an
>> attempt at reviewing this. The more that ...
>>
> Yep. You were Cc-ed because of the request to backport and include in
> stable branches, but thank you very much for also taking the time to
> have a look at it!! :-)
> 
>>> It should, therefore, be backported and applied to all the branches
>>> to
>>> which that commit has been. About backports, it should be
>>> straigthforward to do that until 4.13.
>>
>> ... for 4.13.4 it would of course be nice to have it in. Things look
>> plausible overall, but I've got one question which - despite concerning
>> code you only move - may play into the underlying issue.
>>
> Ok.
> 
>>> For 4.12 and earlier, it's trickier, but the fix is still necessary.
>>> Actually, both 07b0eb5d0ef0 and this patch should be backported to
>>> that
>>> branch!
>>
>> Depends on what you target with this remark: For downstreams - yes. The
>> stable upstream branch, otoh, is out of general support, and since this
>> is not a security fix it is not going to be applied to that tree.
>>
> Yeah, I know. I decided to mention this (although, I probably could
> have made myself more clear) and provide a backport (of this and of the
> other, already committed patch) just for convenience of both users and
> downstreams that happens to use such codebases.
> 
>>>          /*
>>> -         * If an unit is meant to be picked up by another processor,
>>> and such
>>> -         * processor has not scheduled yet, leave it in the runqueue
>>> for him.
>>> +         * If the unit in the runqueue has more credit than current
>>> (or than
>>> +         * idle, if current is not runnable) or if current is
>>> yielding, we may
>>> +         * want to pick it up.
>>>           */
>>> -        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
>>> -             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
>>> +        if ( (yield || svc->credit > snext->credit) )
>>
>> The "credit" field is plain "int", i.e. signed. Idle domain's vCPU-s
>> don't get INT_MIN credit afaict (there's only one use of INT_MIN
>> throughout the entire file). Hence I can't see why in principle a
>> vCPU of an ordinary domain couldn't have equal or less credit than
>> the CPU's idle vCPU. 
>>
> So, if I understand what you mean, yes, we've had that issue, i.e.,
> vCPUs managing to get to credit values which were lower than the one of
> the idle vCPUs.
> 
> That was, in fact, even causing issue and it's what lead to
> 36f3662f27dec32d76c0edb4c6b62b9628d6869d "credit2: avoid vCPUs to ever
> reach lower credits than idle".
> 
> After that commit, idle vCPUs' credits are set to CSCHED2_CREDIT_MIN-1
> and, for regular vCPUs, whenever we subtract some value from their
> credits, we limit them to not go beyond CSCHED2_CREDIT_MIN (this
> happens in t2c_update(), called by burn_credit()).
> 
> Therefore, it should now not be possible any longer for regular vCPUs
> to fall behind idle vCPUs, in terms of amount of credits.
> 
> So, was it this you were asking about and, if yes, does this answer
> your concerns?

Yes, it does. I continue to think though that the "yield" variable
could do with either a comment along the lines of what you've
explained, or with it getting set to true in more cases (as
indicated), as the interaction with the credit comparisons isn't
very obvious right now (to me at least).

>> Otoh I'm having trouble seeing why all of this code movement / re-
>> indentation is necessary in the first place: If the initial if() was
>> inverted to
>>
>>         if ( !yield && svc->credit <= snext->credit )
>>             continue;
>>
> Actually, I am just realizing that if I, instead, use:
> 
>         if ( !yield && svc->credit <= snext->credit )
>             break;
> 
> It would be much better (even as compared to the current situation).
> 
> In fact, right now that the priority check is toward the end, we have
> do the checks that comes earlier in the loop (is it runnable? Can it
> run on this CPU? Is it worth migrating it? Etc) at least for one
> element of the runqueue.
> 
> With either my code or above the suggested form, we don't, but we may
> have to do at least the priority check for all the elements of the
> runqueue. This was something I knew, and as a matter of fact, it should
> be quick enough (and comparable with doing expensive checks even on
> just 1 vCPU). But still, it's ugly.
> 
> However, we know that the runqueue is sorted by credits!

Oh - I was first thinking it might be, but seeing all the logic I
(wrongly, as you now tell me) inferred it's unsorted.

Jan

> So, unless
> we're yielding, it is always the case that as soon as we find there an
> unit that has less credit than snext, we want to bail (and keep running
> snext).
> 
> This means that we neither scan all the runqueues, not even for doing
> just quick priority checks, nor we, in the case that snext is the
> actual highest priority unit need to do any check for the unit at the
> top of the runqueue.
> 
> So, I'm actually re-doing (and re-testing) the patch in this way.
> 
> Thanks again and Regards
> 


Re: [PATCH] Xen: credit2: avoid picking a spurious idle unit when caps are used
Posted by Dario Faggioli 3 years, 3 months ago
On Wed, 2021-08-04 at 17:13 +0200, Jan Beulich wrote:
> On 04.08.2021 15:28, Dario Faggioli wrote:
> > 
> > So, was it this you were asking about and, if yes, does this answer
> > your concerns?
> 
> Yes, it does. 
>
Ok, great. :-)

> I continue to think though that the "yield" variable
> could do with either a comment along the lines of what you've
> explained, or with it getting set to true in more cases (as
> indicated), as the interaction with the credit comparisons isn't
> very obvious right now (to me at least).
> 
Fine. This is not for this patch, of course. But I'll make a note about
it, and think how to restructure the code to make it easier to
understand, in a followup patch.

> > With either my code or above the suggested form, we don't, but we
> > may
> > have to do at least the priority check for all the elements of the
> > runqueue. This was something I knew, and as a matter of fact, it
> > should
> > be quick enough (and comparable with doing expensive checks even on
> > just 1 vCPU). But still, it's ugly.
> > 
> > However, we know that the runqueue is sorted by credits!
> 
> Oh - I was first thinking it might be, but seeing all the logic I
> (wrongly, as you now tell me) inferred it's unsorted.
> 
Yeah, it certainly wasn't evident, from how the patch was done.

Thanks for your comment, that actually made it realize that I could do
things like that, and indeed with even so much less code churn!

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Re: [PATCH] Xen: credit2: avoid picking a spurious idle unit when caps are used
Posted by Dario Faggioli 3 years, 3 months ago
On Tue, 2021-08-03 at 19:36 +0200, Dario Faggioli wrote:
> It should, therefore, be backported and applied to all the branches
> to
> which that commit has been. About backports, it should be
> straigthforward to do that until 4.13.
> 
> For 4.12 and earlier, it's trickier, but the fix is still necessary.
> Actually, both 07b0eb5d0ef0 and this patch should be backported to
> that
> branch!
> 
> I will provide the backports myself in a email that I'll send as a
> reply to this one.
> 
And here they are, attached, the various backports, as they are
described in the paragraph above.

Hope this helps.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Re: [pli PATCH bla] Xen: credit2: avoid picking a spurious idle unit when caps are used
Posted by Dario Faggioli 3 years, 3 months ago
Err... of course, the "pli" and "bla" stuff between the [] are a
leftover of some experiments that I had to do with `stg email`, due to
mail handling changes, and should really not have been there in this
email...

Sorry. :-/

On Tue, 2021-08-03 at 19:36 +0200, Dario Faggioli wrote:
> Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from
> the
> runq if there is one") did not fix completely the problem of
> potentially
> selecting a scheduling unit that will then not be able to run.
> 
> In fact, in case caps are used and the unit we are currently looking
> at, during the runqueue scan, does not have budget to be executed, we
> should continue looking instead than giving up and picking the idle
> unit.
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> Suggested-by: George Dunlap <george.dunlap@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
> This is necessary to completely fix the bug that was described in and
> addressed by 07b0eb5d0ef0 ("credit2: make sure we pick a runnable
> unit
> from the runq if there is one").
> 
> It should, therefore, be backported and applied to all the branches
> to
> which that commit has been. About backports, it should be
> straigthforward to do that until 4.13.
> 
> For 4.12 and earlier, it's trickier, but the fix is still necessary.
> Actually, both 07b0eb5d0ef0 and this patch should be backported to
> that
> branch!
> 
> I will provide the backports myself in a email that I'll send as a
> reply to this one.
> 
> Regards,
> Dario
> ---
>  xen/common/sched/credit2.c |   85 +++++++++++++++++++++++++---------
> ----------
>  1 file changed, 49 insertions(+), 36 deletions(-)
> 
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index ebb09ea43a..f9b95db313 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3463,48 +3463,61 @@ runq_candidate(struct csched2_runqueue_data
> *rqd,
>                          (unsigned char *)&d);
>          }
>  
> -        /* Skip non runnable units that we (temporarily) have in the
> runq */
> -        if ( unlikely(!unit_runnable_state(svc->unit)) )
> -            continue;
> -
> -        /* Only consider vcpus that are allowed to run on this
> processor. */
> -        if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
> -            continue;
> -
>          /*
> -         * If an unit is meant to be picked up by another processor,
> and such
> -         * processor has not scheduled yet, leave it in the runqueue
> for him.
> +         * If the unit in the runqueue has more credit than current
> (or than
> +         * idle, if current is not runnable) or if current is
> yielding, we may
> +         * want to pick it up.
>           */
> -        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> -             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> +        if ( (yield || svc->credit > snext->credit) )
>          {
> -            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> -            continue;
> -        }
> +            /* Skip non runnable units that we (temporarily) have in
> the runq */
> +            if ( unlikely(!unit_runnable_state(svc->unit)) )
> +                continue;
>  
> -        /*
> -         * If this is on a different processor, don't pull it unless
> -         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> -         */
> -        if ( sched_unit_master(svc->unit) != cpu
> -             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit
> )
> -        {
> -            SCHED_STAT_CRANK(migrate_resisted);
> -            continue;
> -        }
> +            /* Only consider vcpus that are allowed to run on this
> processor. */
> +            if ( !cpumask_test_cpu(cpu, svc->unit-
> >cpu_hard_affinity) )
> +                continue;
>  
> -        /*
> -         * If the one in the runqueue has more credit than current
> (or idle,
> -         * if current is not runnable), or if current is yielding,
> and also
> -         * if the one in runqueue either is not capped, or is capped
> but has
> -         * some budget, then choose it.
> -         */
> -        if ( (yield || svc->credit > snext->credit) &&
> -             (!has_cap(svc) || unit_grab_budget(svc)) )
> -            snext = svc;
> +            /*
> +             * If an unit is meant to be picked up by another
> processor, and such
> +             * processor has not scheduled yet, leave it in the
> runqueue for him.
> +             */
> +            if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu
> &&
> +                 cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> +            {
> +                SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> +                continue;
> +            }
>  
> -        /* In any case, if we got this far, break. */
> -        break;
> +            /*
> +             * If this is on a different processor, don't pull it
> unless
> +             * its credit is at least CSCHED2_MIGRATE_RESIST higher.
> +             */
> +            if ( sched_unit_master(svc->unit) != cpu
> +                 && snext->credit + CSCHED2_MIGRATE_RESIST > svc-
> >credit )
> +            {
> +                SCHED_STAT_CRANK(migrate_resisted);
> +                continue;
> +            }
> +
> +            /*
> +             * If we are here, we are almost sure we want to pick
> the unit in
> +             * the runqueue. Last thing we need to check is that it
> either is
> +             * is not capped, or if it is, it has some budget.
> +             *
> +             * Note that cap & budget should really be the last
> thing we do
> +             * check. In fact, unit_grab_budget() will reserve some
> budget for
> +             * this unit, from the per-domain budget pool, and we
> want to do
> +             * that only if we are sure that we'll then run the
> unit, consume
> +             * some of it, and return the leftover (if any) in the
> usual way.
> +             */
> +            if ( has_cap(svc) && !unit_grab_budget(svc) )
> +                continue;
> +
> +            /* If we got this far, we are done. */
> +            snext = svc;
> +            break;
> +        }
>      }
>  
>      if ( unlikely(tb_init_done) )
> 
> 
> 

-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)