sched: correct sched_move_domain()'s cleanup path

[PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Jan Beulich 4 months, 3 weeks ago

It is only in the error case that we want to clean up the new pool's
scheduler data; in the success case it's rather the old scheduler's
data which needs cleaning up.

Reported-by: René Winther Højgaard <renewin@proton.me>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
     for ( unit = old_units; unit; )
     {
         if ( unit->priv )
-            sched_free_udata(c->sched, unit->priv);
+            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
         old_unit = unit;
         unit = unit->next_in_list;
         xfree(old_unit);

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by George Dunlap 4 months, 3 weeks ago

On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> It is only in the error case that we want to clean up the new pool's
> scheduler data; in the success case it's rather the old scheduler's
> data which needs cleaning up.
>
> Reported-by: René Winther Højgaard <renewin@proton.me>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
>
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>      for ( unit = old_units; unit; )
>      {
>          if ( unit->priv )
> -            sched_free_udata(c->sched, unit->priv);
> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>          old_unit = unit;
>          unit = unit->next_in_list;
>          xfree(old_unit);

This code is unfortunately written in a "clever" way which seems to
have introduced some confusion.  The one place which calls "goto
out_free" goes through and replaces *most* of the "old_*" variables
with the "new" equivalents.  That's why we're iterating over
`old_units` even on the failure path.

The result is that this change doesn't catch another bug on the
following line, in the error case:

sched_free_domdata(old_ops, old_domdata);

At this point, old_ops is still the old ops, but old_domdata is the
*new* domdata.

A patch like the following (compile tested only) would fix it along
the lines of the original intent:
8<-------
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..78f21839d3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         {
             old_units = new_units;
             old_domdata = domdata;
+            old_ops = c->sched;
             ret = -ENOMEM;
             goto out_free;
         }
@@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     domain_unpause(d);

  out_free:
+    /*
+     * NB if we've jumped here, "old_units", "old_ops", and so on will
+     * actually be pointing to the new ops, since when aborting it's
+     * the new ops we want to free.
+     */
     for ( unit = old_units; unit; )
     {
         if ( unit->priv )
-            sched_free_udata(c->sched, unit->priv);
+            sched_free_udata(old_ops, unit->priv);
         old_unit = unit;
         unit = unit->next_in_list;
         xfree(old_unit);
---->8

But given that this kind of cleverness has already fooled two of our
most senior developers, I'd suggest making the whole thing more
explicit; something like the attached (again compile-tested only)?

 -George
From ab7ecae921d0fa21bd2561990c91aeda79275cef Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@cloud.com>
Date: Mon, 4 Dec 2023 12:25:03 +0000
Subject: [PATCH] sched: clarify and correct sched_move_domain()'s cleanup path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

While moving a domain, sched_move_domain first makes per-scheduler
allocations for the scheduler of the new cpupool, then deletes the old
ones.  In the event of an error, the *new* allocations must be freed
via the new scheduler.

In order to avoid code duplication, the error path commandeered the
old_units an old_domdata variables to point them at the new units, so
that the which on the non-error path freed the old units would now
free the new units.

This code, however, had two bugs in it.  First, when looping over the
units, it always frees using "c->sched" (the new ops); meaning on the
success path we're freeing the old domdata using the new scheduler.
Secondly, when freeing the domdata, it always frees using old_ops,
meaning on the failure path we're freeing the new domdata using the
old scheduler.

We could follow suit with the existing code, and on the error path set
old_ops to c->sched, and always use "old_ops" in the freeing code.
This would solve both problems.

However, given the number of mistakes already made in this code, it
seems like a better option is to make a new set of explicitly-named
"free_*" variables to use for this purpose.

While here, rename "domdata" to "new_domdata", in line with
"new_units", for clarity.

Reported-by: René Winther Højgaard <renewin@proton.me>
Initial-fix-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
 xen/common/sched/core.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..b4d0785903 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -678,12 +678,11 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
     struct sched_unit *unit, *old_unit;
-    struct sched_unit *new_units = NULL, *old_units;
+    struct sched_unit *new_units = NULL, *old_units, *free_units;
     struct sched_unit **unit_ptr = &new_units;
     unsigned int new_p, unit_idx;
-    void *domdata;
-    struct scheduler *old_ops = dom_scheduler(d);
-    void *old_domdata;
+    struct scheduler *old_ops = dom_scheduler(d), *free_ops;
+    void *new_domdata, *old_domdata, *free_domdata;
     unsigned int gran = cpupool_get_granularity(c);
     unsigned int n_units = d->vcpu[0] ? DIV_ROUND_UP(d->max_vcpus, gran) : 0;
     int ret = 0;
@@ -696,10 +695,10 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     rcu_read_lock(&sched_res_rculock);
 
-    domdata = sched_alloc_domdata(c->sched, d);
-    if ( IS_ERR(domdata) )
+    new_domdata = sched_alloc_domdata(c->sched, d);
+    if ( IS_ERR(new_domdata) )
     {
-        ret = PTR_ERR(domdata);
+        ret = PTR_ERR(new_domdata);
         goto out;
     }
 
@@ -712,14 +711,16 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
             unit->domain = d;
             unit->unit_id = unit_idx * gran;
             unit->vcpu_list = d->vcpu[unit->unit_id];
-            unit->priv = sched_alloc_udata(c->sched, unit, domdata);
+            unit->priv = sched_alloc_udata(c->sched, unit, new_domdata);
             *unit_ptr = unit;
         }
 
         if ( !unit || !unit->priv )
         {
-            old_units = new_units;
-            old_domdata = domdata;
+            /* Failure path frees the new units/domdata from the new ops */
+            free_units = new_units;
+            free_domdata = new_domdata;
+            free_ops = c->sched;
             ret = -ENOMEM;
             goto out_free;
         }
@@ -751,7 +752,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     old_units = d->sched_unit_list;
 
     d->cpupool = c;
-    d->sched_priv = domdata;
+    d->sched_priv = new_domdata;
 
     unit = new_units;
     for_each_vcpu ( d, v )
@@ -808,17 +809,22 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     domain_unpause(d);
 
+    /* Success path frees the old units/domdata from the old ops */
+    free_ops = old_ops;
+    free_units = old_units;
+    free_domdata = old_domdata;
+
  out_free:
-    for ( unit = old_units; unit; )
+    for ( unit = free_units; unit; )
     {
         if ( unit->priv )
-            sched_free_udata(c->sched, unit->priv);
+            sched_free_udata(free_ops, unit->priv);
         old_unit = unit;
         unit = unit->next_in_list;
         xfree(old_unit);
     }
 
-    sched_free_domdata(old_ops, old_domdata);
+    sched_free_domdata(free_ops, free_domdata);
 
  out:
     rcu_read_unlock(&sched_res_rculock);
-- 
2.25.1

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Juergen Gross 4 months, 3 weeks ago

On 04.12.23 14:00, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> It is only in the error case that we want to clean up the new pool's
>> scheduler data; in the success case it's rather the old scheduler's
>> data which needs cleaning up.
>>
>> Reported-by: René Winther Højgaard <renewin@proton.me>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>       for ( unit = old_units; unit; )
>>       {
>>           if ( unit->priv )
>> -            sched_free_udata(c->sched, unit->priv);
>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>           old_unit = unit;
>>           unit = unit->next_in_list;
>>           xfree(old_unit);
> 
> This code is unfortunately written in a "clever" way which seems to
> have introduced some confusion.  The one place which calls "goto
> out_free" goes through and replaces *most* of the "old_*" variables
> with the "new" equivalents.  That's why we're iterating over
> `old_units` even on the failure path.
> 
> The result is that this change doesn't catch another bug on the
> following line, in the error case:
> 
> sched_free_domdata(old_ops, old_domdata);
> 
> At this point, old_ops is still the old ops, but old_domdata is the
> *new* domdata.
> 
> A patch like the following (compile tested only) would fix it along
> the lines of the original intent:
> 8<-------
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index eba0cea4bb..78f21839d3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>           {
>               old_units = new_units;
>               old_domdata = domdata;
> +            old_ops = c->sched;
>               ret = -ENOMEM;
>               goto out_free;
>           }
> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>       domain_unpause(d);
> 
>    out_free:
> +    /*
> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> +     * actually be pointing to the new ops, since when aborting it's
> +     * the new ops we want to free.
> +     */
>       for ( unit = old_units; unit; )
>       {
>           if ( unit->priv )
> -            sched_free_udata(c->sched, unit->priv);
> +            sched_free_udata(old_ops, unit->priv);
>           old_unit = unit;
>           unit = unit->next_in_list;
>           xfree(old_unit);
> ---->8
> 
> But given that this kind of cleverness has already fooled two of our
> most senior developers, I'd suggest making the whole thing more
> explicit; something like the attached (again compile-tested only)?

And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch (this time it should be really there).

Thoughts?


Juergen

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Jan Beulich 4 months, 3 weeks ago

On 04.12.2023 15:10, Juergen Gross wrote:
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch (this time it should be really there).

Looks more neat to me than George's. Just one minor thing: Please can the
first parameter of sched_move_domain_cleanup() be constified?

Jan

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Juergen Gross 4 months, 3 weeks ago

On 04.12.23 15:38, Jan Beulich wrote:
> On 04.12.2023 15:10, Juergen Gross wrote:
>> And I have again a third approach, making it crystal clear what is happening
>> with which data. No need to explain what is freed via which variables. See
>> attached patch (this time it should be really there).
> 
> Looks more neat to me than George's. Just one minor thing: Please can the
> first parameter of sched_move_domain_cleanup() be constified?

Yes, will do that.

I'll send out V2 soon together with the other fix (this probably wants
an update of the commit message) and a small cleanup patch I have.


Juergen

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by George Dunlap 4 months, 3 weeks ago

On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross <jgross@suse.com> wrote:
>
> On 04.12.23 14:00, George Dunlap wrote:
> > On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> It is only in the error case that we want to clean up the new pool's
> >> scheduler data; in the success case it's rather the old scheduler's
> >> data which needs cleaning up.
> >>
> >> Reported-by: René Winther Højgaard <renewin@proton.me>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> Reviewed-by: Juergen Gross <jgross@suse.com>
> >>
> >> --- a/xen/common/sched/core.c
> >> +++ b/xen/common/sched/core.c
> >> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
> >>       for ( unit = old_units; unit; )
> >>       {
> >>           if ( unit->priv )
> >> -            sched_free_udata(c->sched, unit->priv);
> >> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
> >>           old_unit = unit;
> >>           unit = unit->next_in_list;
> >>           xfree(old_unit);
> >
> > This code is unfortunately written in a "clever" way which seems to
> > have introduced some confusion.  The one place which calls "goto
> > out_free" goes through and replaces *most* of the "old_*" variables
> > with the "new" equivalents.  That's why we're iterating over
> > `old_units` even on the failure path.
> >
> > The result is that this change doesn't catch another bug on the
> > following line, in the error case:
> >
> > sched_free_domdata(old_ops, old_domdata);
> >
> > At this point, old_ops is still the old ops, but old_domdata is the
> > *new* domdata.
> >
> > A patch like the following (compile tested only) would fix it along
> > the lines of the original intent:
> > 8<-------
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index eba0cea4bb..78f21839d3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >           {
> >               old_units = new_units;
> >               old_domdata = domdata;
> > +            old_ops = c->sched;
> >               ret = -ENOMEM;
> >               goto out_free;
> >           }
> > @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >       domain_unpause(d);
> >
> >    out_free:
> > +    /*
> > +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> > +     * actually be pointing to the new ops, since when aborting it's
> > +     * the new ops we want to free.
> > +     */
> >       for ( unit = old_units; unit; )
> >       {
> >           if ( unit->priv )
> > -            sched_free_udata(c->sched, unit->priv);
> > +            sched_free_udata(old_ops, unit->priv);
> >           old_unit = unit;
> >           unit = unit->next_in_list;
> >           xfree(old_unit);
> > ---->8
> >
> > But given that this kind of cleverness has already fooled two of our
> > most senior developers, I'd suggest making the whole thing more
> > explicit; something like the attached (again compile-tested only)?
>
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch (this time it should be really there).

Yes, I thought about making a function as well -- that works for me too.

Personally I prefer to keep the "goto out", rather than duplicating
the rcu_read_unlock().  I'd yield if Jan said he preferred
duplication, however.

 -George

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Jan Beulich 4 months, 3 weeks ago

On 04.12.2023 15:18, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross <jgross@suse.com> wrote:
>>
>> On 04.12.23 14:00, George Dunlap wrote:
>>> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> It is only in the error case that we want to clean up the new pool's
>>>> scheduler data; in the success case it's rather the old scheduler's
>>>> data which needs cleaning up.
>>>>
>>>> Reported-by: René Winther Højgaard <renewin@proton.me>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> --- a/xen/common/sched/core.c
>>>> +++ b/xen/common/sched/core.c
>>>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>>>       for ( unit = old_units; unit; )
>>>>       {
>>>>           if ( unit->priv )
>>>> -            sched_free_udata(c->sched, unit->priv);
>>>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>>>           old_unit = unit;
>>>>           unit = unit->next_in_list;
>>>>           xfree(old_unit);
>>>
>>> This code is unfortunately written in a "clever" way which seems to
>>> have introduced some confusion.  The one place which calls "goto
>>> out_free" goes through and replaces *most* of the "old_*" variables
>>> with the "new" equivalents.  That's why we're iterating over
>>> `old_units` even on the failure path.
>>>
>>> The result is that this change doesn't catch another bug on the
>>> following line, in the error case:
>>>
>>> sched_free_domdata(old_ops, old_domdata);
>>>
>>> At this point, old_ops is still the old ops, but old_domdata is the
>>> *new* domdata.
>>>
>>> A patch like the following (compile tested only) would fix it along
>>> the lines of the original intent:
>>> 8<-------
>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>> index eba0cea4bb..78f21839d3 100644
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>           {
>>>               old_units = new_units;
>>>               old_domdata = domdata;
>>> +            old_ops = c->sched;
>>>               ret = -ENOMEM;
>>>               goto out_free;
>>>           }
>>> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>       domain_unpause(d);
>>>
>>>    out_free:
>>> +    /*
>>> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
>>> +     * actually be pointing to the new ops, since when aborting it's
>>> +     * the new ops we want to free.
>>> +     */
>>>       for ( unit = old_units; unit; )
>>>       {
>>>           if ( unit->priv )
>>> -            sched_free_udata(c->sched, unit->priv);
>>> +            sched_free_udata(old_ops, unit->priv);
>>>           old_unit = unit;
>>>           unit = unit->next_in_list;
>>>           xfree(old_unit);
>>> ---->8
>>>
>>> But given that this kind of cleverness has already fooled two of our
>>> most senior developers, I'd suggest making the whole thing more
>>> explicit; something like the attached (again compile-tested only)?
>>
>> And I have again a third approach, making it crystal clear what is happening
>> with which data. No need to explain what is freed via which variables. See
>> attached patch (this time it should be really there).
> 
> Yes, I thought about making a function as well -- that works for me too.
> 
> Personally I prefer to keep the "goto out", rather than duplicating
> the rcu_read_unlock().  I'd yield if Jan said he preferred
> duplication, however.

I'm on the edge there actually.

Jan

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Juergen Gross 4 months, 3 weeks ago

On 04.12.23 15:39, Jan Beulich wrote:
> On 04.12.2023 15:18, George Dunlap wrote:
>> On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross <jgross@suse.com> wrote:
>>>
>>> On 04.12.23 14:00, George Dunlap wrote:
>>>> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>
>>>>> It is only in the error case that we want to clean up the new pool's
>>>>> scheduler data; in the success case it's rather the old scheduler's
>>>>> data which needs cleaning up.
>>>>>
>>>>> Reported-by: René Winther Højgaard <renewin@proton.me>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>>>>
>>>>> --- a/xen/common/sched/core.c
>>>>> +++ b/xen/common/sched/core.c
>>>>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>>>>        for ( unit = old_units; unit; )
>>>>>        {
>>>>>            if ( unit->priv )
>>>>> -            sched_free_udata(c->sched, unit->priv);
>>>>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>>>>            old_unit = unit;
>>>>>            unit = unit->next_in_list;
>>>>>            xfree(old_unit);
>>>>
>>>> This code is unfortunately written in a "clever" way which seems to
>>>> have introduced some confusion.  The one place which calls "goto
>>>> out_free" goes through and replaces *most* of the "old_*" variables
>>>> with the "new" equivalents.  That's why we're iterating over
>>>> `old_units` even on the failure path.
>>>>
>>>> The result is that this change doesn't catch another bug on the
>>>> following line, in the error case:
>>>>
>>>> sched_free_domdata(old_ops, old_domdata);
>>>>
>>>> At this point, old_ops is still the old ops, but old_domdata is the
>>>> *new* domdata.
>>>>
>>>> A patch like the following (compile tested only) would fix it along
>>>> the lines of the original intent:
>>>> 8<-------
>>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>>> index eba0cea4bb..78f21839d3 100644
>>>> --- a/xen/common/sched/core.c
>>>> +++ b/xen/common/sched/core.c
>>>> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>>            {
>>>>                old_units = new_units;
>>>>                old_domdata = domdata;
>>>> +            old_ops = c->sched;
>>>>                ret = -ENOMEM;
>>>>                goto out_free;
>>>>            }
>>>> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>>        domain_unpause(d);
>>>>
>>>>     out_free:
>>>> +    /*
>>>> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
>>>> +     * actually be pointing to the new ops, since when aborting it's
>>>> +     * the new ops we want to free.
>>>> +     */
>>>>        for ( unit = old_units; unit; )
>>>>        {
>>>>            if ( unit->priv )
>>>> -            sched_free_udata(c->sched, unit->priv);
>>>> +            sched_free_udata(old_ops, unit->priv);
>>>>            old_unit = unit;
>>>>            unit = unit->next_in_list;
>>>>            xfree(old_unit);
>>>> ---->8
>>>>
>>>> But given that this kind of cleverness has already fooled two of our
>>>> most senior developers, I'd suggest making the whole thing more
>>>> explicit; something like the attached (again compile-tested only)?
>>>
>>> And I have again a third approach, making it crystal clear what is happening
>>> with which data. No need to explain what is freed via which variables. See
>>> attached patch (this time it should be really there).
>>
>> Yes, I thought about making a function as well -- that works for me too.
>>
>> Personally I prefer to keep the "goto out", rather than duplicating
>> the rcu_read_unlock().  I'd yield if Jan said he preferred
>> duplication, however.
> 
> I'm on the edge there actually.

In this case I'd prefer it my way, as it avoids having to scroll down to the
out: label to see what is happening there. Additionally it enables to get rid
of the ret variable.

In the end the main part of the patch is the new function, so I'm not really
feeling strong regarding the dropping of "goto out".


Juergen

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by George Dunlap 4 months, 3 weeks ago

On Mon, Dec 4, 2023 at 2:44 PM Juergen Gross <jgross@suse.com> wrote:
> >> Personally I prefer to keep the "goto out", rather than duplicating
> >> the rcu_read_unlock().  I'd yield if Jan said he preferred
> >> duplication, however.
> >
> > I'm on the edge there actually.
>
> In this case I'd prefer it my way, as it avoids having to scroll down to the
> out: label to see what is happening there. Additionally it enables to get rid
> of the ret variable.

The issue is, suppose we change something else, like needing to grab
(and release) another lock?  Sharing the exit path makes it easier to
avoid those kinds of mistakes.

 -George

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Juergen Gross 4 months, 3 weeks ago

On 04.12.23 20:33, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 2:44 PM Juergen Gross <jgross@suse.com> wrote:
>>>> Personally I prefer to keep the "goto out", rather than duplicating
>>>> the rcu_read_unlock().  I'd yield if Jan said he preferred
>>>> duplication, however.
>>>
>>> I'm on the edge there actually.
>>
>> In this case I'd prefer it my way, as it avoids having to scroll down to the
>> out: label to see what is happening there. Additionally it enables to get rid
>> of the ret variable.
> 
> The issue is, suppose we change something else, like needing to grab
> (and release) another lock?  Sharing the exit path makes it easier to
> avoid those kinds of mistakes.

Yes, this could happen. OTOH it could happen that an action is added
on the exit path which should _not_ be executed in the error case.

I agree that in case of more than one exit action needed the goto approach
is superior. For zero exit actions it is inferior, while for one it will
depend on the specific case IMHO.

Juergen

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by Juergen Gross 4 months, 3 weeks ago

On 04.12.23 14:00, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> It is only in the error case that we want to clean up the new pool's
>> scheduler data; in the success case it's rather the old scheduler's
>> data which needs cleaning up.
>>
>> Reported-by: René Winther Højgaard <renewin@proton.me>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>       for ( unit = old_units; unit; )
>>       {
>>           if ( unit->priv )
>> -            sched_free_udata(c->sched, unit->priv);
>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>           old_unit = unit;
>>           unit = unit->next_in_list;
>>           xfree(old_unit);
> 
> This code is unfortunately written in a "clever" way which seems to
> have introduced some confusion.  The one place which calls "goto
> out_free" goes through and replaces *most* of the "old_*" variables
> with the "new" equivalents.  That's why we're iterating over
> `old_units` even on the failure path.
> 
> The result is that this change doesn't catch another bug on the
> following line, in the error case:
> 
> sched_free_domdata(old_ops, old_domdata);
> 
> At this point, old_ops is still the old ops, but old_domdata is the
> *new* domdata.
> 
> A patch like the following (compile tested only) would fix it along
> the lines of the original intent:
> 8<-------
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index eba0cea4bb..78f21839d3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>           {
>               old_units = new_units;
>               old_domdata = domdata;
> +            old_ops = c->sched;
>               ret = -ENOMEM;
>               goto out_free;
>           }
> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>       domain_unpause(d);
> 
>    out_free:
> +    /*
> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> +     * actually be pointing to the new ops, since when aborting it's
> +     * the new ops we want to free.
> +     */
>       for ( unit = old_units; unit; )
>       {
>           if ( unit->priv )
> -            sched_free_udata(c->sched, unit->priv);
> +            sched_free_udata(old_ops, unit->priv);
>           old_unit = unit;
>           unit = unit->next_in_list;
>           xfree(old_unit);
> ---->8
> 
> But given that this kind of cleverness has already fooled two of our
> most senior developers, I'd suggest making the whole thing more
> explicit; something like the attached (again compile-tested only)?

And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch.

Thoughts?


Juergen

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

Posted by George Dunlap 4 months, 3 weeks ago

On Mon, Dec 4, 2023 at 1:47 PM Juergen Gross <jgross@suse.com> wrote:
>
> On 04.12.23 14:00, George Dunlap wrote:
> > On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> It is only in the error case that we want to clean up the new pool's
> >> scheduler data; in the success case it's rather the old scheduler's
> >> data which needs cleaning up.
> >>
> >> Reported-by: René Winther Højgaard <renewin@proton.me>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> Reviewed-by: Juergen Gross <jgross@suse.com>
> >>
> >> --- a/xen/common/sched/core.c
> >> +++ b/xen/common/sched/core.c
> >> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
> >>       for ( unit = old_units; unit; )
> >>       {
> >>           if ( unit->priv )
> >> -            sched_free_udata(c->sched, unit->priv);
> >> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
> >>           old_unit = unit;
> >>           unit = unit->next_in_list;
> >>           xfree(old_unit);
> >
> > This code is unfortunately written in a "clever" way which seems to
> > have introduced some confusion.  The one place which calls "goto
> > out_free" goes through and replaces *most* of the "old_*" variables
> > with the "new" equivalents.  That's why we're iterating over
> > `old_units` even on the failure path.
> >
> > The result is that this change doesn't catch another bug on the
> > following line, in the error case:
> >
> > sched_free_domdata(old_ops, old_domdata);
> >
> > At this point, old_ops is still the old ops, but old_domdata is the
> > *new* domdata.
> >
> > A patch like the following (compile tested only) would fix it along
> > the lines of the original intent:
> > 8<-------
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index eba0cea4bb..78f21839d3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >           {
> >               old_units = new_units;
> >               old_domdata = domdata;
> > +            old_ops = c->sched;
> >               ret = -ENOMEM;
> >               goto out_free;
> >           }
> > @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >       domain_unpause(d);
> >
> >    out_free:
> > +    /*
> > +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> > +     * actually be pointing to the new ops, since when aborting it's
> > +     * the new ops we want to free.
> > +     */
> >       for ( unit = old_units; unit; )
> >       {
> >           if ( unit->priv )
> > -            sched_free_udata(c->sched, unit->priv);
> > +            sched_free_udata(old_ops, unit->priv);
> >           old_unit = unit;
> >           unit = unit->next_in_list;
> >           xfree(old_unit);
> > ---->8
> >
> > But given that this kind of cleverness has already fooled two of our
> > most senior developers, I'd suggest making the whole thing more
> > explicit; something like the attached (again compile-tested only)?
>
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch.
>
> Thoughts?

I only see a PGP key and signature.  Did you forget to attach the patch?

 -George