xen/sched: fix race in RTDS scheduler

[PATCH-for-4.17] xen/sched: fix race in RTDS scheduler

Posted by Juergen Gross 1 year, 6 months ago

When a domain gets paused the unit runnable state can change to "not
runnable" without the scheduling lock being involved. This means that
a specific scheduler isn't involved in this change of runnable state.

In the RTDS scheduler this can result in an inconsistency in case a
unit is losing its "runnable" capability while the RTDS scheduler's
scheduling function is active. RTDS will remove the unit from the run
queue, but doesn't do so for the replenish queue, leading to hitting
an ASSERT() in replq_insert() later when the domain is unpaused again.

Fix that by removing the unit from the replenish queue as well in this
case.

Fixes: 7c7b407e7772 ("xen/sched: introduce unit_runnable_state()")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched/rt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
index d6de25531b..960a8033e2 100644
--- a/xen/common/sched/rt.c
+++ b/xen/common/sched/rt.c
@@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
         else if ( !unit_runnable_state(snext->unit) )
         {
             q_remove(snext);
+            replq_remove(ops, snext);
             snext = rt_unit(sched_idle_unit(sched_cpu));
         }
 
-- 
2.35.3

Re: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler

Posted by Dario Faggioli 1 year, 6 months ago

Ok, and now, something not really related to the bug being fixed here,
but about the code that is being touched:

On Fri, 2022-10-21 at 08:10 +0200, Juergen Gross wrote:
> diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
> index d6de25531b..960a8033e2 100644
> --- a/xen/common/sched/rt.c
> +++ b/xen/common/sched/rt.c
> @@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct
> sched_unit *currunit,
>          else if ( !unit_runnable_state(snext->unit) )
>          {
>              q_remove(snext);
> +            replq_remove(ops, snext);
>              snext = rt_unit(sched_idle_unit(sched_cpu));
>          }
>  
So, adding a few more context here, the code looks like this:

        snext = runq_pick(ops, cpumask_of(sched_cpu), cur_cpu);

        if ( snext == NULL )
            snext = rt_unit(sched_idle_unit(sched_cpu));
        else if ( !unit_runnable_state(snext->unit) )
        {
            q_remove(snext);
            snext = rt_unit(sched_idle_unit(sched_cpu));
        }

Basically, we've tried to pick-up the highest priority task from the
runqueue. If snext is NULL, the runqueue was just empty, so we pick up
idle (and then, later, we'll check whether the currently running unit
is still runnable; and if it is, we'll continue to run it, of course).

However, it can happen that --e.g., due to core-scheduling-- we picked
up a unit that, despite being in the runqueue, is not runnable. At this
point what we do is removing it from the runqueue (to avoid picking it
up again) and we go for idle.

Now, I may be missing/misremembering something, but it looks to me that
it's possible that there are other runnable units in the runqueue. And
if that's the case, why do we just pick idle and move on, instead of
continuing trying?

Juergen... Am I missing or misremembering any fundamental reason why we
cannot continue to scan the runqueue until the first runnable unit (if
any) is found?

Of course, this is not really related with the bug this patch is
fixing, which is correct and should be applied, no matter what the
outcome of this subthread will be. :-)

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler

Posted by Juergen Gross 1 year, 6 months ago

On 21.10.22 11:37, Dario Faggioli wrote:
> Ok, and now, something not really related to the bug being fixed here,
> but about the code that is being touched:
> 
> On Fri, 2022-10-21 at 08:10 +0200, Juergen Gross wrote:
>> diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
>> index d6de25531b..960a8033e2 100644
>> --- a/xen/common/sched/rt.c
>> +++ b/xen/common/sched/rt.c
>> @@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct
>> sched_unit *currunit,
>>           else if ( !unit_runnable_state(snext->unit) )
>>           {
>>               q_remove(snext);
>> +            replq_remove(ops, snext);
>>               snext = rt_unit(sched_idle_unit(sched_cpu));
>>           }
>>   
> So, adding a few more context here, the code looks like this:
> 
>          snext = runq_pick(ops, cpumask_of(sched_cpu), cur_cpu);
> 
>          if ( snext == NULL )
>              snext = rt_unit(sched_idle_unit(sched_cpu));
>          else if ( !unit_runnable_state(snext->unit) )
>          {
>              q_remove(snext);
>              snext = rt_unit(sched_idle_unit(sched_cpu));
>          }
> 
> Basically, we've tried to pick-up the highest priority task from the
> runqueue. If snext is NULL, the runqueue was just empty, so we pick up
> idle (and then, later, we'll check whether the currently running unit
> is still runnable; and if it is, we'll continue to run it, of course).
> 
> However, it can happen that --e.g., due to core-scheduling-- we picked
> up a unit that, despite being in the runqueue, is not runnable. At this
> point what we do is removing it from the runqueue (to avoid picking it
> up again) and we go for idle.
> 
> Now, I may be missing/misremembering something, but it looks to me that
> it's possible that there are other runnable units in the runqueue. And
> if that's the case, why do we just pick idle and move on, instead of
> continuing trying?
> 
> Juergen... Am I missing or misremembering any fundamental reason why we
> cannot continue to scan the runqueue until the first runnable unit (if
> any) is found?

No. This code was introduced in the RFC V2 series of core scheduling.
And it was not the result of a previous discussion on xen-devel.

> Of course, this is not really related with the bug this patch is
> fixing, which is correct and should be applied, no matter what the
> outcome of this subthread will be. :-)

I can write another patch trying to fix that, but that shouldn't be
4.17 material IMHO.


Juergen

Re: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler

Posted by Dario Faggioli 1 year, 6 months ago

On Fri, 2022-10-21 at 08:10 +0200, Juergen Gross wrote:
> When a domain gets paused the unit runnable state can change to "not
> runnable" without the scheduling lock being involved. This means that
> a specific scheduler isn't involved in this change of runnable state.
> 
> In the RTDS scheduler this can result in an inconsistency in case a
> unit is losing its "runnable" capability while the RTDS scheduler's
> scheduling function is active. RTDS will remove the unit from the run
> queue, but doesn't do so for the replenish queue, leading to hitting
> an ASSERT() in replq_insert() later when the domain is unpaused
> again.
> 
> Fix that by removing the unit from the replenish queue as well in
> this
> case.
> 
Ah, ok... So, all is fine until what could happen during rt_schedule(),
was "just" that the currently running task, not only is descheduled,
but it also became !runnable.

In fact, in this case, the unit itself is not in the runq, but it can
be in the replq. However, since it still has the RTDS_scheduled flag
set, either:
1) we reach rt_context_saved(), which remove it from replq, before any 
   replq_insert;
2) rt_unit_wake() is called, but due to RTDS_scheduled, it may only do 
   replq_reinsert(), which is fine with the unit being already there.

However, what can also happen in rt_schedule() is that we remove from
the runq an unit that was not running, and hence does not have the
RTDS_scheduled flat set. In which case, rt_context_saved() doesn't do
anything to it (of course!). And as soon as rt_unit_wake() happens, it
does replq_insert(), which is not fine with finding the replenishment
event in the queue already.

So, yes... And good catch! :-P

> Fixes: 7c7b407e7772 ("xen/sched: introduce unit_runnable_state()")
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Acked-by: Dario Faggioli <dfaggioli@suse.com>

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

RE: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler

Posted by Henry Wang 1 year, 6 months ago

Hi Juergen,

> -----Original Message-----
> From: Juergen Gross <jgross@suse.com>
> Subject: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler
> 
> When a domain gets paused the unit runnable state can change to "not
> runnable" without the scheduling lock being involved. This means that
> a specific scheduler isn't involved in this change of runnable state.
> 
> In the RTDS scheduler this can result in an inconsistency in case a
> unit is losing its "runnable" capability while the RTDS scheduler's
> scheduling function is active. RTDS will remove the unit from the run
> queue, but doesn't do so for the replenish queue, leading to hitting
> an ASSERT() in replq_insert() later when the domain is unpaused again.
> 
> Fix that by removing the unit from the replenish queue as well in this
> case.
> 
> Fixes: 7c7b407e7772 ("xen/sched: introduce unit_runnable_state()")
> Signed-off-by: Juergen Gross <jgross@suse.com>

Thanks for the quick fix.

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry

> ---
>  xen/common/sched/rt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
> index d6de25531b..960a8033e2 100644
> --- a/xen/common/sched/rt.c
> +++ b/xen/common/sched/rt.c
> @@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct
> sched_unit *currunit,
>          else if ( !unit_runnable_state(snext->unit) )
>          {
>              q_remove(snext);
> +            replq_remove(ops, snext);
>              snext = rt_unit(sched_idle_unit(sched_cpu));
>          }
> 
> --
> 2.35.3