[Xen-devel] [PATCH] xen/sched: fix cpu offlining with core scheduling

Juergen Gross posted 1 patch 4 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200303123058.27210-1-jgross@suse.com
There is a newer version of this series
xen/common/sched/core.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
[Xen-devel] [PATCH] xen/sched: fix cpu offlining with core scheduling
Posted by Juergen Gross 4 years, 1 month ago
Offlining a cpu with core scheduling active can result in a hanging
system. Reason is the scheduling resource and unit of the to be removed
cpus needs to be split in order to remove the cpu from its cpupool and
move it to the idle scheduler. In case one of the involved cpus happens
to have received a sched slave event due to a vcpu former having been
running on that cpu being woken up again, it can happen that this cpu
will enter sched_wait_rendezvous_in() while its scheduling resource is
just about to be split. It might wait for ever for the other sibling
to join, which will never happen due to the resources already being
modified.

This can easily be avoided by:
- resetting the rendezvous counters of the idle unit which is kept
- checking for a new scheduling resource in sched_wait_rendezvous_in()
  after reacquiring the scheduling lock and resetting the counters in
  that case without scheduling another vcpu
- moving schedule resource modifications (in schedule_cpu_rm()) and
  retrieving (schedule(), sched_slave() is fine already, others are not
  critical) into locked regions

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched/core.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 7e8e7d2c39..723283ed00 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2415,7 +2415,8 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 {
     struct sched_unit *next;
     struct vcpu *v;
-    unsigned int gran = get_sched_res(cpu)->granularity;
+    struct sched_resource *sr = get_sched_res(cpu);
+    unsigned int gran = sr->granularity;
 
     if ( !--prev->rendezvous_in_cnt )
     {
@@ -2482,6 +2483,19 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
             atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
             prev->rendezvous_in_cnt = 0;
         }
+
+        /*
+         * Check for scheduling resourced switched. This happens when we are
+         * moved away from our cpupool and cpus are subject of the idle
+         * scheduler now.
+         */
+        if ( unlikely(sr != get_sched_res(cpu)) )
+        {
+            ASSERT(is_idle_unit(prev));
+            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
+            prev->rendezvous_in_cnt = 0;
+            return NULL;
+        }
     }
 
     return prev->next_task;
@@ -2538,7 +2552,10 @@ static void sched_slave(void)
 
     next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
     if ( !next )
+    {
+        rcu_read_unlock(&sched_res_rculock);
         return;
+    }
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
@@ -2567,11 +2584,11 @@ static void schedule(void)
 
     rcu_read_lock(&sched_res_rculock);
 
+    lock = pcpu_schedule_lock_irq(cpu);
+
     sr = get_sched_res(cpu);
     gran = sr->granularity;
 
-    lock = pcpu_schedule_lock_irq(cpu);
-
     if ( prev->rendezvous_in_cnt )
     {
         /*
@@ -2599,7 +2616,10 @@ static void schedule(void)
         cpumask_raise_softirq(mask, SCHED_SLAVE_SOFTIRQ);
         next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
         if ( !next )
+        {
+            rcu_read_unlock(&sched_res_rculock);
             return;
+        }
     }
     else
     {
@@ -3151,7 +3171,10 @@ int schedule_cpu_rm(unsigned int cpu)
         per_cpu(sched_res_idx, cpu_iter) = 0;
         if ( cpu_iter == cpu )
         {
-            idle_vcpu[cpu_iter]->sched_unit->priv = NULL;
+            unit = idle_vcpu[cpu_iter]->sched_unit;
+            unit->priv = NULL;
+            atomic_set(&unit->next_task->rendezvous_out_cnt, 0);
+            unit->rendezvous_in_cnt = 0;
         }
         else
         {
@@ -3182,6 +3205,8 @@ int schedule_cpu_rm(unsigned int cpu)
     }
     sr->scheduler = &sched_idle_ops;
     sr->sched_priv = NULL;
+    sr->granularity = 1;
+    sr->cpupool = NULL;
 
     smp_mb();
     sr->schedule_lock = &sched_free_cpu_lock;
@@ -3194,9 +3219,6 @@ int schedule_cpu_rm(unsigned int cpu)
     sched_free_udata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
-    sr->granularity = 1;
-    sr->cpupool = NULL;
-
 out:
     rcu_read_unlock(&sched_res_rculock);
     xfree(sr_new);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/sched: fix cpu offlining with core scheduling
Posted by Jan Beulich 4 years, 1 month ago
On 03.03.2020 13:30, Juergen Gross wrote:
> @@ -2538,7 +2552,10 @@ static void sched_slave(void)
>  
>      next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>      if ( !next )
> +    {
> +        rcu_read_unlock(&sched_res_rculock);
>          return;
> +    }

This and ...

> @@ -2599,7 +2616,10 @@ static void schedule(void)
>          cpumask_raise_softirq(mask, SCHED_SLAVE_SOFTIRQ);
>          next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>          if ( !next )
> +        {
> +            rcu_read_unlock(&sched_res_rculock);
>              return;
> +        }

... this look like independent fixes, as on Arm,
sched_wait_rendezvous_in() can already return NULL. If they get
folded into here, I think the description should clarify that
these are orthogonal to the rest.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/sched: fix cpu offlining with core scheduling
Posted by Jürgen Groß 4 years, 1 month ago
On 03.03.20 14:45, Jan Beulich wrote:
> On 03.03.2020 13:30, Juergen Gross wrote:
>> @@ -2538,7 +2552,10 @@ static void sched_slave(void)
>>   
>>       next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>       if ( !next )
>> +    {
>> +        rcu_read_unlock(&sched_res_rculock);
>>           return;
>> +    }
> 
> This and ...
> 
>> @@ -2599,7 +2616,10 @@ static void schedule(void)
>>           cpumask_raise_softirq(mask, SCHED_SLAVE_SOFTIRQ);
>>           next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>           if ( !next )
>> +        {
>> +            rcu_read_unlock(&sched_res_rculock);
>>               return;
>> +        }
> 
> ... this look like independent fixes, as on Arm,
> sched_wait_rendezvous_in() can already return NULL. If they get
> folded into here, I think the description should clarify that
> these are orthogonal to the rest.

Yeah, probably better to split the patch.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/sched: fix cpu offlining with core scheduling
Posted by Jürgen Groß 4 years, 1 month ago
On 03.03.20 17:05, Jürgen Groß wrote:
> On 03.03.20 14:45, Jan Beulich wrote:
>> On 03.03.2020 13:30, Juergen Gross wrote:
>>> @@ -2538,7 +2552,10 @@ static void sched_slave(void)
>>>       next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>>       if ( !next )
>>> +    {
>>> +        rcu_read_unlock(&sched_res_rculock);
>>>           return;
>>> +    }
>>
>> This and ...
>>
>>> @@ -2599,7 +2616,10 @@ static void schedule(void)
>>>           cpumask_raise_softirq(mask, SCHED_SLAVE_SOFTIRQ);
>>>           next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>>           if ( !next )
>>> +        {
>>> +            rcu_read_unlock(&sched_res_rculock);
>>>               return;
>>> +        }
>>
>> ... this look like independent fixes, as on Arm,
>> sched_wait_rendezvous_in() can already return NULL. If they get
>> folded into here, I think the description should clarify that
>> these are orthogonal to the rest.
> 
> Yeah, probably better to split the patch.

Oh, this patch was wrong: Up to now sched_wait_rendezvous_in() always
was responsible for dropping sched_res_rculock, so I should do that in
the new NULL return case, too.

So no patch splitting, but V2.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/sched: fix cpu offlining with core scheduling
Posted by Jan Beulich 4 years, 1 month ago
On 03.03.2020 17:20, Jürgen Groß wrote:
> On 03.03.20 17:05, Jürgen Groß wrote:
>> On 03.03.20 14:45, Jan Beulich wrote:
>>> On 03.03.2020 13:30, Juergen Gross wrote:
>>>> @@ -2538,7 +2552,10 @@ static void sched_slave(void)
>>>>       next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>>>       if ( !next )
>>>> +    {
>>>> +        rcu_read_unlock(&sched_res_rculock);
>>>>           return;
>>>> +    }
>>>
>>> This and ...
>>>
>>>> @@ -2599,7 +2616,10 @@ static void schedule(void)
>>>>           cpumask_raise_softirq(mask, SCHED_SLAVE_SOFTIRQ);
>>>>           next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>>>           if ( !next )
>>>> +        {
>>>> +            rcu_read_unlock(&sched_res_rculock);
>>>>               return;
>>>> +        }
>>>
>>> ... this look like independent fixes, as on Arm,
>>> sched_wait_rendezvous_in() can already return NULL. If they get
>>> folded into here, I think the description should clarify that
>>> these are orthogonal to the rest.
>>
>> Yeah, probably better to split the patch.
> 
> Oh, this patch was wrong: Up to now sched_wait_rendezvous_in() always
> was responsible for dropping sched_res_rculock, so I should do that in
> the new NULL return case, too.

Oh, through calling of sched_context_switch(). I guess both functions
want to gain a comment about this aspect of their behavior.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/sched: fix cpu offlining with core scheduling
Posted by Jürgen Groß 4 years, 1 month ago
On 03.03.20 17:27, Jan Beulich wrote:
> On 03.03.2020 17:20, Jürgen Groß wrote:
>> On 03.03.20 17:05, Jürgen Groß wrote:
>>> On 03.03.20 14:45, Jan Beulich wrote:
>>>> On 03.03.2020 13:30, Juergen Gross wrote:
>>>>> @@ -2538,7 +2552,10 @@ static void sched_slave(void)
>>>>>        next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>>>>        if ( !next )
>>>>> +    {
>>>>> +        rcu_read_unlock(&sched_res_rculock);
>>>>>            return;
>>>>> +    }
>>>>
>>>> This and ...
>>>>
>>>>> @@ -2599,7 +2616,10 @@ static void schedule(void)
>>>>>            cpumask_raise_softirq(mask, SCHED_SLAVE_SOFTIRQ);
>>>>>            next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
>>>>>            if ( !next )
>>>>> +        {
>>>>> +            rcu_read_unlock(&sched_res_rculock);
>>>>>                return;
>>>>> +        }
>>>>
>>>> ... this look like independent fixes, as on Arm,
>>>> sched_wait_rendezvous_in() can already return NULL. If they get
>>>> folded into here, I think the description should clarify that
>>>> these are orthogonal to the rest.
>>>
>>> Yeah, probably better to split the patch.
>>
>> Oh, this patch was wrong: Up to now sched_wait_rendezvous_in() always
>> was responsible for dropping sched_res_rculock, so I should do that in
>> the new NULL return case, too.
> 
> Oh, through calling of sched_context_switch(). I guess both functions
> want to gain a comment about this aspect of their behavior.

Yes, already done. :-)


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel