[Xen-devel] [PATCH] sched/core: Fix bug when moving a domain between cpupools

Jeff Kubascik posted 1 patch 4 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200327193023.506-1-jeff.kubascik@dornerworks.com
Maintainers: Dario Faggioli <dfaggioli@suse.com>, George Dunlap <george.dunlap@citrix.com>
xen/common/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[Xen-devel] [PATCH] sched/core: Fix bug when moving a domain between cpupools
Posted by Jeff Kubascik 4 years ago
For each UNIT, sched_set_affinity is called before unit->priv is updated
to the new cpupool private UNIT data structure. The issue is
sched_set_affinity will call the adjust_affinity method of the cpupool.
If defined, the new cpupool may use unit->priv (e.g. credit), which at
this point still references the old cpupool private UNIT data structure.

This change fixes the bug by moving the switch of unit->priv earler in
the function.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
---
Hello,

I've been working on updating the arinc653 scheduler to support
multicore for a few months now. In the process of testing, I came across
this obscure bug in the core scheduler code that took me a few weeks to
track down. This bug resulted in the credit scheduler writing past the
end of the arinc653 private UNIT data structure into the TLSF allocator
bhdr structure of the adjacent region. This required some deep diving
into the TLSF allocator code to trace the bug back to this point.

Sincerely,
Jeff Kubascik
---
 xen/common/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 7e8e7d2c39..ea572a345a 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -686,6 +686,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         unsigned int unit_p = new_p;
 
         unitdata = unit->priv;
+        unit->priv = unit_priv[unit_idx];
 
         for_each_sched_unit_vcpu ( unit, v )
         {
@@ -707,7 +708,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        unit->priv = unit_priv[unit_idx];
         if ( !d->is_dying )
             sched_move_irqs(unit);
 
-- 
2.17.1


Re: [PATCH] sched/core: Fix bug when moving a domain between cpupools
Posted by Jeff Kubascik 3 years, 11 months ago
Hello,

I wanted to follow up on this patch, as I have not seen any responses to it.

In my work on the ARINC653 scheduler, I have observed this bug write to memory
past the end of a private UNIT structure (and in my case, stomping on a TLSF
allocator header) when migrating a domain from an ARINC cpupool to a credit
cpupool. This occurs because (a) the private UNIT structure is smaller for the
ARINC cpupool and (b) the credit scheduler method csched_aff_cntl does some bit
setting/ clearing while the private UNIT pointer still points incorrectly to the
ARINC cpupool one.

On 3/27/2020 3:30 PM, Jeff Kubascik wrote:
> For each UNIT, sched_set_affinity is called before unit->priv is updated
> to the new cpupool private UNIT data structure. The issue is
> sched_set_affinity will call the adjust_affinity method of the cpupool.
> If defined, the new cpupool may use unit->priv (e.g. credit), which at
> this point still references the old cpupool private UNIT data structure.
> 
> This change fixes the bug by moving the switch of unit->priv earler in
> the function.
> 
> Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
> ---
> Hello,
> 
> I've been working on updating the arinc653 scheduler to support
> multicore for a few months now. In the process of testing, I came across
> this obscure bug in the core scheduler code that took me a few weeks to
> track down. This bug resulted in the credit scheduler writing past the
> end of the arinc653 private UNIT data structure into the TLSF allocator
> bhdr structure of the adjacent region. This required some deep diving
> into the TLSF allocator code to trace the bug back to this point.
> 
> Sincerely,
> Jeff Kubascik
> ---
>  xen/common/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index 7e8e7d2c39..ea572a345a 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -686,6 +686,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>          unsigned int unit_p = new_p;
>  
>          unitdata = unit->priv;
> +        unit->priv = unit_priv[unit_idx];
>  
>          for_each_sched_unit_vcpu ( unit, v )
>          {
> @@ -707,7 +708,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>           */
>          spin_unlock_irq(lock);
>  
> -        unit->priv = unit_priv[unit_idx];
>          if ( !d->is_dying )
>              sched_move_irqs(unit);
>  
> 

Sincerely,
Jeff Kubascik

Re: [Xen-devel] [PATCH] sched/core: Fix bug when moving a domain between cpupools
Posted by Jürgen Groß 3 years, 11 months ago
On 27.03.20 20:30, Jeff Kubascik wrote:
> For each UNIT, sched_set_affinity is called before unit->priv is updated
> to the new cpupool private UNIT data structure. The issue is
> sched_set_affinity will call the adjust_affinity method of the cpupool.
> If defined, the new cpupool may use unit->priv (e.g. credit), which at
> this point still references the old cpupool private UNIT data structure.
> 
> This change fixes the bug by moving the switch of unit->priv earler in
> the function.
> 
> Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

Re: [Xen-devel] [PATCH] sched/core: Fix bug when moving a domain between cpupools
Posted by Dario Faggioli 3 years, 11 months ago
On Wed, 2020-04-15 at 11:08 +0200, Jürgen Groß wrote:
> On 27.03.20 20:30, Jeff Kubascik wrote:
> > For each UNIT, sched_set_affinity is called before unit->priv is
> > updated
> > to the new cpupool private UNIT data structure. The issue is
> > sched_set_affinity will call the adjust_affinity method of the
> > cpupool.
> > If defined, the new cpupool may use unit->priv (e.g. credit), which
> > at
> > this point still references the old cpupool private UNIT data
> > structure.
> > 
> > This change fixes the bug by moving the switch of unit->priv earler
> > in
> > the function.
> > 
> > Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
>
Acked-by: Dario Faggioli <dfaggioli@suse.com>

Sorry it took a while. And thanks Juergen for also having a look.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Re: [Xen-devel] [PATCH] sched/core: Fix bug when moving a domain between cpupools
Posted by Jeff Kubascik 3 years, 11 months ago
Thank you Juergen and Dario!

On 4/16/2020 12:09 PM, Dario Faggioli wrote:
> On Wed, 2020-04-15 at 11:08 +0200, Jürgen Groß wrote:
>> On 27.03.20 20:30, Jeff Kubascik wrote:
>>> For each UNIT, sched_set_affinity is called before unit->priv is
>>> updated
>>> to the new cpupool private UNIT data structure. The issue is
>>> sched_set_affinity will call the adjust_affinity method of the
>>> cpupool.
>>> If defined, the new cpupool may use unit->priv (e.g. credit), which
>>> at
>>> this point still references the old cpupool private UNIT data
>>> structure.
>>>
>>> This change fixes the bug by moving the switch of unit->priv earler
>>> in
>>> the function.
>>>
>>> Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
>>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
> Acked-by: Dario Faggioli <dfaggioli@suse.com>
> 
> Sorry it took a while. And thanks Juergen for also having a look.
> 
> Regards
> 

Sincerely,
Jeff Kubascik