[Xen-devel] [PATCH] xen/sched: fix csched2_deinit_pdata()

Juergen Gross posted 1 patch 4 years, 11 months ago
Failed in applying to current master (apply log)
xen/common/sched_credit2.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
[Xen-devel] [PATCH] xen/sched: fix csched2_deinit_pdata()
Posted by Juergen Gross 4 years, 11 months ago
Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
introduced a regression when switching cpus between cpupools.

When assigning a cpu to a cpupool with credit2 being the default
scheduler csched2_deinit_pdata() is called for the credit2 private data
after the new scheduler's private data has been hooked to the per-cpu
scheduler data. Unfortunately csched2_deinit_pdata() will cycle through
all per-cpu scheduler areas it knows of for removing the cpu from the
respective sibling masks including the area of the just moved cpu. This
will (depending on the new scheduler) either clobber the data of the
new scheduler or in case of sched_rt lead to a crash.

Avoid that by removing the cpu from the list of active cpus in credit2
data first.

The opposite problem is occurring when removing a cpu from a cpupool:
init_pdata() of credit2 will access the per-cpu data of the old
scheduler.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit2.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6958b265fc..9c1c3b4e08 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3813,22 +3813,21 @@ init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
         activate_runqueue(prv, spc->runq_id);
     }
 
-    __cpumask_set_cpu(cpu, &rqd->idle);
-    __cpumask_set_cpu(cpu, &rqd->active);
-    __cpumask_set_cpu(cpu, &prv->initialized);
-    __cpumask_set_cpu(cpu, &rqd->smt_idle);
+    __cpumask_set_cpu(cpu, &spc->sibling_mask);
 
-    /* On the boot cpu we are called before cpu_sibling_mask has been set up. */
-    if ( cpu == 0 && system_state < SYS_STATE_active )
-        __cpumask_set_cpu(cpu, &csched2_pcpu(cpu)->sibling_mask);
-    else
+    if ( cpumask_weight(&rqd->active) > 0 )
         for_each_cpu ( rcpu, per_cpu(cpu_sibling_mask, cpu) )
             if ( cpumask_test_cpu(rcpu, &rqd->active) )
             {
                 __cpumask_set_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
-                __cpumask_set_cpu(rcpu, &csched2_pcpu(cpu)->sibling_mask);
+                __cpumask_set_cpu(rcpu, &spc->sibling_mask);
             }
 
+    __cpumask_set_cpu(cpu, &rqd->idle);
+    __cpumask_set_cpu(cpu, &rqd->active);
+    __cpumask_set_cpu(cpu, &prv->initialized);
+    __cpumask_set_cpu(cpu, &rqd->smt_idle);
+
     if ( cpumask_weight(&rqd->active) == 1 )
         rqd->pick_bias = cpu;
 
@@ -3937,13 +3936,13 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 
     printk(XENLOG_INFO "Removing cpu %d from runqueue %d\n", cpu, spc->runq_id);
 
-    for_each_cpu ( rcpu, &rqd->active )
-        __cpumask_clear_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
-
     __cpumask_clear_cpu(cpu, &rqd->idle);
     __cpumask_clear_cpu(cpu, &rqd->smt_idle);
     __cpumask_clear_cpu(cpu, &rqd->active);
 
+    for_each_cpu ( rcpu, &rqd->active )
+        __cpumask_clear_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
+
     if ( cpumask_empty(&rqd->active) )
     {
         printk(XENLOG_INFO " No cpus left on runqueue, disabling\n");
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Ping: [PATCH] xen/sched: fix csched2_deinit_pdata()
Posted by Jan Beulich 4 years, 11 months ago
>>> On 08.05.19 at 13:31, <jgross@suse.com> wrote:
> Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
> introduced a regression when switching cpus between cpupools.
> 
> When assigning a cpu to a cpupool with credit2 being the default
> scheduler csched2_deinit_pdata() is called for the credit2 private data
> after the new scheduler's private data has been hooked to the per-cpu
> scheduler data. Unfortunately csched2_deinit_pdata() will cycle through
> all per-cpu scheduler areas it knows of for removing the cpu from the
> respective sibling masks including the area of the just moved cpu. This
> will (depending on the new scheduler) either clobber the data of the
> new scheduler or in case of sched_rt lead to a crash.
> 
> Avoid that by removing the cpu from the list of active cpus in credit2
> data first.
> 
> The opposite problem is occurring when removing a cpu from a cpupool:
> init_pdata() of credit2 will access the per-cpu data of the old
> scheduler.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

May I ask what the disposition of this is? I've noticed too late
that I've backported the commit being fixed here without
waiting for this fix to go in. I'd prefer the stable trees, in
particular 4.11 for the impending 4.11.2 release, to be able
to pick this up soon.

Thanks, Jan

> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -3813,22 +3813,21 @@ init_pdata(struct csched2_private *prv, struct csched2_pcpu *spc,
>          activate_runqueue(prv, spc->runq_id);
>      }
>  
> -    __cpumask_set_cpu(cpu, &rqd->idle);
> -    __cpumask_set_cpu(cpu, &rqd->active);
> -    __cpumask_set_cpu(cpu, &prv->initialized);
> -    __cpumask_set_cpu(cpu, &rqd->smt_idle);
> +    __cpumask_set_cpu(cpu, &spc->sibling_mask);
>  
> -    /* On the boot cpu we are called before cpu_sibling_mask has been set 
> up. */
> -    if ( cpu == 0 && system_state < SYS_STATE_active )
> -        __cpumask_set_cpu(cpu, &csched2_pcpu(cpu)->sibling_mask);
> -    else
> +    if ( cpumask_weight(&rqd->active) > 0 )
>          for_each_cpu ( rcpu, per_cpu(cpu_sibling_mask, cpu) )
>              if ( cpumask_test_cpu(rcpu, &rqd->active) )
>              {
>                  __cpumask_set_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
> -                __cpumask_set_cpu(rcpu, &csched2_pcpu(cpu)->sibling_mask);
> +                __cpumask_set_cpu(rcpu, &spc->sibling_mask);
>              }
>  
> +    __cpumask_set_cpu(cpu, &rqd->idle);
> +    __cpumask_set_cpu(cpu, &rqd->active);
> +    __cpumask_set_cpu(cpu, &prv->initialized);
> +    __cpumask_set_cpu(cpu, &rqd->smt_idle);
> +
>      if ( cpumask_weight(&rqd->active) == 1 )
>          rqd->pick_bias = cpu;
>  
> @@ -3937,13 +3936,13 @@ csched2_deinit_pdata(const struct scheduler *ops, 
> void *pcpu, int cpu)
>  
>      printk(XENLOG_INFO "Removing cpu %d from runqueue %d\n", cpu, 
> spc->runq_id);
>  
> -    for_each_cpu ( rcpu, &rqd->active )
> -        __cpumask_clear_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
> -
>      __cpumask_clear_cpu(cpu, &rqd->idle);
>      __cpumask_clear_cpu(cpu, &rqd->smt_idle);
>      __cpumask_clear_cpu(cpu, &rqd->active);
>  
> +    for_each_cpu ( rcpu, &rqd->active )
> +        __cpumask_clear_cpu(cpu, &csched2_pcpu(rcpu)->sibling_mask);
> +
>      if ( cpumask_empty(&rqd->active) )
>      {
>          printk(XENLOG_INFO " No cpus left on runqueue, disabling\n");
> -- 
> 2.16.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org 
> https://lists.xenproject.org/mailman/listinfo/xen-devel 





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH] xen/sched: fix csched2_deinit_pdata()
Posted by Juergen Gross 4 years, 11 months ago
On 17/05/2019 15:17, Jan Beulich wrote:
>>>> On 08.05.19 at 13:31, <jgross@suse.com> wrote:
>> Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
>> introduced a regression when switching cpus between cpupools.
>>
>> When assigning a cpu to a cpupool with credit2 being the default
>> scheduler csched2_deinit_pdata() is called for the credit2 private data
>> after the new scheduler's private data has been hooked to the per-cpu
>> scheduler data. Unfortunately csched2_deinit_pdata() will cycle through
>> all per-cpu scheduler areas it knows of for removing the cpu from the
>> respective sibling masks including the area of the just moved cpu. This
>> will (depending on the new scheduler) either clobber the data of the
>> new scheduler or in case of sched_rt lead to a crash.
>>
>> Avoid that by removing the cpu from the list of active cpus in credit2
>> data first.
>>
>> The opposite problem is occurring when removing a cpu from a cpupool:
>> init_pdata() of credit2 will access the per-cpu data of the old
>> scheduler.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> May I ask what the disposition of this is? I've noticed too late
> that I've backported the commit being fixed here without
> waiting for this fix to go in. I'd prefer the stable trees, in
> particular 4.11 for the impending 4.11.2 release, to be able
> to pick this up soon.

Dario already gave his Reviewed-by:


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH] xen/sched: fix csched2_deinit_pdata()
Posted by Jan Beulich 4 years, 11 months ago
>>> On 17.05.19 at 15:24, <jgross@suse.com> wrote:
> On 17/05/2019 15:17, Jan Beulich wrote:
>>>>> On 08.05.19 at 13:31, <jgross@suse.com> wrote:
>>> Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
>>> introduced a regression when switching cpus between cpupools.
>>>
>>> When assigning a cpu to a cpupool with credit2 being the default
>>> scheduler csched2_deinit_pdata() is called for the credit2 private data
>>> after the new scheduler's private data has been hooked to the per-cpu
>>> scheduler data. Unfortunately csched2_deinit_pdata() will cycle through
>>> all per-cpu scheduler areas it knows of for removing the cpu from the
>>> respective sibling masks including the area of the just moved cpu. This
>>> will (depending on the new scheduler) either clobber the data of the
>>> new scheduler or in case of sched_rt lead to a crash.
>>>
>>> Avoid that by removing the cpu from the list of active cpus in credit2
>>> data first.
>>>
>>> The opposite problem is occurring when removing a cpu from a cpupool:
>>> init_pdata() of credit2 will access the per-cpu data of the old
>>> scheduler.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> 
>> May I ask what the disposition of this is? I've noticed too late
>> that I've backported the commit being fixed here without
>> waiting for this fix to go in. I'd prefer the stable trees, in
>> particular 4.11 for the impending 4.11.2 release, to be able
>> to pick this up soon.
> 
> Dario already gave his Reviewed-by:

Oh, my fault then - apologies.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH] xen/sched: fix csched2_deinit_pdata()
Posted by Juergen Gross 4 years, 11 months ago
On 17/05/2019 15:36, Jan Beulich wrote:
>>>> On 17.05.19 at 15:24, <jgross@suse.com> wrote:
>> On 17/05/2019 15:17, Jan Beulich wrote:
>>>>>> On 08.05.19 at 13:31, <jgross@suse.com> wrote:
>>>> Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
>>>> introduced a regression when switching cpus between cpupools.
>>>>
>>>> When assigning a cpu to a cpupool with credit2 being the default
>>>> scheduler csched2_deinit_pdata() is called for the credit2 private data
>>>> after the new scheduler's private data has been hooked to the per-cpu
>>>> scheduler data. Unfortunately csched2_deinit_pdata() will cycle through
>>>> all per-cpu scheduler areas it knows of for removing the cpu from the
>>>> respective sibling masks including the area of the just moved cpu. This
>>>> will (depending on the new scheduler) either clobber the data of the
>>>> new scheduler or in case of sched_rt lead to a crash.
>>>>
>>>> Avoid that by removing the cpu from the list of active cpus in credit2
>>>> data first.
>>>>
>>>> The opposite problem is occurring when removing a cpu from a cpupool:
>>>> init_pdata() of credit2 will access the per-cpu data of the old
>>>> scheduler.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> May I ask what the disposition of this is? I've noticed too late
>>> that I've backported the commit being fixed here without
>>> waiting for this fix to go in. I'd prefer the stable trees, in
>>> particular 4.11 for the impending 4.11.2 release, to be able
>>> to pick this up soon.
>>
>> Dario already gave his Reviewed-by:
> 
> Oh, my fault then - apologies.

NP. In case you need the pointer:

https://lists.xen.org/archives/html/xen-devel/2019-05/msg00734.html


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/sched: fix csched2_deinit_pdata()
Posted by Dario Faggioli 4 years, 11 months ago
On Wed, 2019-05-08 at 13:31 +0200, Juergen Gross wrote:
> Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
> introduced a regression when switching cpus between cpupools.
> 
> When assigning a cpu to a cpupool with credit2 being the default
> scheduler csched2_deinit_pdata() is called for the credit2 private
> data
> after the new scheduler's private data has been hooked to the per-cpu
> scheduler data. Unfortunately csched2_deinit_pdata() will cycle
> through
> all per-cpu scheduler areas it knows of for removing the cpu from the
> respective sibling masks including the area of the just moved cpu.
> This
> will (depending on the new scheduler) either clobber the data of the
> new scheduler or in case of sched_rt lead to a crash.
> 
These issues will drive us all crazy, at some point! :-(

I really wish I managed to put a cpupool test in OSSTest. I guess I can
pickup my old draft, but I need some time to put together an OSSTest
dev environment again. :-/

Anyway...

> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel