A !runnable unit (temporarily) present in the runq may cause us to
stop scanning the runq itself too early. Of course, we don't run any
non-runnable vCPUs, but we end the scan and we fallback to picking
the idle unit. In other word, this prevent us to find there and pick
the actual unit that we're meant to start running (which might be
further ahead in the runq).
Depending on the vCPU pinning configuration, this may lead to such
unit to be stuck in the runq for long time, causing malfunctioning
inside the guest.
Fix this by checking runnable/non-runnable status up-front, in the runq
scanning function.
Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl>
Reported-by: Dion Kant <g.w.kant@hunenet.nl>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Michał Leszczyński <michal.leszczynski@cert.pl>
Cc: Dion Kant <g.w.kant@hunenet.nl>
---
This is a bugfix and it solves the following problems, reported in
various ways:
* https://lists.xen.org/archives/html/xen-devel/2020-05/msg01985.html
* https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01561.html
* https://bugzilla.opensuse.org/show_bug.cgi?id=1179246
Hence, it should be backported, I'd say as far as possible... At least
to all the releases that have Credit2 as the default scheduler.
I will look further into this, and I think I can provide the backports
myself.
I'd like to send a *huge* thank you to Dion Kant who arranged for me to
be able to use a box where it was particularly easy to reproduce the
bug, and that was for all the time that it took me to finally be able to
work on this properly and nail it! :-)
---
xen/common/sched/credit2.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index eb5e5a78c5..f5c1e5b944 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3463,6 +3463,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
(unsigned char *)&d);
}
+ /* Skip non runnable units that we (temporarily) have in the runq */
+ if ( unlikely(!unit_runnable_state(svc->unit)) )
+ continue;
+
/* Only consider vcpus that are allowed to run on this processor. */
if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )
continue;
@@ -3496,8 +3500,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
* some budget, then choose it.
*/
if ( (yield || svc->credit > snext->credit) &&
- (!has_cap(svc) || unit_grab_budget(svc)) &&
- unit_runnable_state(svc->unit) )
+ (!has_cap(svc) || unit_grab_budget(svc)) )
snext = svc;
/* In any case, if we got this far, break. */
> On May 28, 2021, at 4:12 PM, Dario Faggioli <dfaggioli@suse.com> wrote: > > A !runnable unit (temporarily) present in the runq may cause us to > stop scanning the runq itself too early. Of course, we don't run any > non-runnable vCPUs, but we end the scan and we fallback to picking > the idle unit. In other word, this prevent us to find there and pick > the actual unit that we're meant to start running (which might be > further ahead in the runq). > > Depending on the vCPU pinning configuration, this may lead to such > unit to be stuck in the runq for long time, causing malfunctioning > inside the guest. > > Fix this by checking runnable/non-runnable status up-front, in the runq > scanning function. > > Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl> > Reported-by: Dion Kant <g.w.kant@hunenet.nl> > Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Thanks for tracking this down, Dario! Reviewed-by: George Dunlap <george.dunlap@citrix.com> Just one comment: > @@ -3496,8 +3500,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, > * some budget, then choose it. > */ > if ( (yield || svc->credit > snext->credit) && > - (!has_cap(svc) || unit_grab_budget(svc)) && > - unit_runnable_state(svc->unit) ) > + (!has_cap(svc) || unit_grab_budget(svc)) ) > snext = svc; By the same logic, shouldn’t we also move the `(!has_cap() …)` clause into a separate `if(x) continue` clause? There may be runnable units further down the queue which aren’t capped / haven’t exhausted their budget yet. -George
Hi George (and sorry for the delay in replying), On Mon, 2021-06-07 at 12:10 +0000, George Dunlap wrote: > > On May 28, 2021, at 4:12 PM, Dario Faggioli <dfaggioli@suse.com> > > wrote: > > Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl> > > Reported-by: Dion Kant <g.w.kant@hunenet.nl> > > Signed-off-by: Dario Faggioli <dfaggioli@suse.com> > > Thanks for tracking this down, Dario! > Hehe, this was a nasty one indeed! :-P > Reviewed-by: George Dunlap <george.dunlap@citrix.com> > > Just one comment: > > @@ -3496,8 +3500,7 @@ runq_candidate(struct csched2_runqueue_data > > *rqd, > > * some budget, then choose it. > > */ > > if ( (yield || svc->credit > snext->credit) && > > - (!has_cap(svc) || unit_grab_budget(svc)) && > > - unit_runnable_state(svc->unit) ) > > + (!has_cap(svc) || unit_grab_budget(svc)) ) > > snext = svc; > > By the same logic, shouldn’t we also move the `(!has_cap() …)` clause > into a separate `if(x) continue` clause? There may be runnable units > further down the queue which aren’t capped / haven’t exhausted their > budget yet. > That's actually a very good point. I think, however, that it's even more complicated than this. In fact, if I move the cap+budget check in its own if above this one, it can happen that some budget is grabbed by the unit. If, however, we then don't pick it up (because of priority) we then need to have it return the budget right away, which it's tricky. So, I came up with the solution of turning the above `if` into this: /* * If the one in the runqueue has more credit than current (or idle, * if current was not runnable) or if current is yielding, we can try * to actually pick it. */ if ( (yield || svc->credit > snext->credit) ) { /* * The last thing we need to check, is whether the unit has enough * budget, in case it is capped. We need to do it now, when we are * otherwise sure that we want to pick it already (rather than in * its own independent 'if'), to avoid the hassle of needing to * return the budget (which we'd have to if we checked and grabbed * it but then decide to run someone else). */ if ( has_cap(svc) && !unit_grab_budget(svc) ) continue; /* And if we have go this far, we are done. */ snext = svc; break; } Of course, something else that we can do is to pull the `if (yield || svc->credit > snext->credit))` "out" of all the other checks, i.e., doing all the checks we do only either if we're yielding or if the unit in the runq actually has higher priority. Efficiency wise, I don't think there will be much difference. The latter solution is more code-churn (we're basically re-indenting one level to the right almost the entire `list_for_each_safe` body), but it might be more readable and easy to understand and follow in the long run. So, any preference? :-) Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)
© 2016 - 2024 Red Hat, Inc.