[PATCH] domain: use unsigned loop induction variable in complete_domain_destroy()

Jan Beulich posted 1 patch 3 months, 2 weeks ago
Failed in applying to current master (apply log)
[PATCH] domain: use unsigned loop induction variable in complete_domain_destroy()
Posted by Jan Beulich 3 months, 2 weeks ago
Using plain (signed) int variables as array indexes can be unhelpful on at
least x86, where the compiler may see the need to insert sign-extension
insns (strictly speaking it should be able to avoid that when the loop
continuation condition says >= 0, but that's not generally the case even
with gcc15).

Observed effects with gcc15 (will of course vary with compiler version and
level of optimization):
- on x86, one less preserved register in use, yet due to sub-optimal
  choice of register variables still a small code size increase (%r12
  isn't a good choice when it's used for base-without-index addressing, as
  it requires a SIB byte which other registers wouldn't require),
- on Arm64 code size decreases, albeit that's eaten up by padding which is
  being inserted ahead of a few labels,
- on Arm32 code size increases for a reason I didn't fully understand (my
  ability to read Arm assembly is still somewhat limited).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
 {
     struct domain *d = container_of(head, struct domain, rcu);
     struct vcpu *v;
-    int i;
+    unsigned int i;
 
     /*
      * Flush all state for the vCPU previously having run on the current CPU.
@@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
      */
     sync_local_execstate();
 
-    for ( i = d->max_vcpus - 1; i >= 0; i-- )
+    for ( i = d->max_vcpus; i-- > 0; )
     {
         if ( (v = d->vcpu[i]) == NULL )
             continue;
@@ -1511,7 +1511,7 @@ static void cf_check complete_domain_des
     xfree(d->vm_event_share);
 #endif
 
-    for ( i = d->max_vcpus - 1; i >= 0; i-- )
+    for ( i = d->max_vcpus; i-- > 0; )
         if ( (v = d->vcpu[i]) != NULL )
             vcpu_destroy(v);
Re: [PATCH] domain: use unsigned loop induction variable in complete_domain_destroy()
Posted by Roger Pau Monné 3 months, 1 week ago
On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
> Using plain (signed) int variables as array indexes can be unhelpful on at
> least x86, where the compiler may see the need to insert sign-extension
> insns (strictly speaking it should be able to avoid that when the loop
> continuation condition says >= 0, but that's not generally the case even
> with gcc15).
> 
> Observed effects with gcc15 (will of course vary with compiler version and
> level of optimization):
> - on x86, one less preserved register in use, yet due to sub-optimal
>   choice of register variables still a small code size increase (%r12
>   isn't a good choice when it's used for base-without-index addressing, as
>   it requires a SIB byte which other registers wouldn't require),
> - on Arm64 code size decreases, albeit that's eaten up by padding which is
>   being inserted ahead of a few labels,
> - on Arm32 code size increases for a reason I didn't fully understand (my
>   ability to read Arm assembly is still somewhat limited).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
>  {
>      struct domain *d = container_of(head, struct domain, rcu);
>      struct vcpu *v;
> -    int i;
> +    unsigned int i;
>  
>      /*
>       * Flush all state for the vCPU previously having run on the current CPU.
> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
>       */
>      sync_local_execstate();
>  
> -    for ( i = d->max_vcpus - 1; i >= 0; i-- )
> +    for ( i = d->max_vcpus; i-- > 0; )

Is there any reason we need to do those loops backwards?

I would rather do:

for ( i = 0; i < d->max_vcpus; i++ )

?

Thanks, Roger.
Re: [PATCH] domain: use unsigned loop induction variable in complete_domain_destroy()
Posted by Jan Beulich 3 months, 1 week ago
On 04.03.2026 16:38, Roger Pau Monné wrote:
> On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
>>  {
>>      struct domain *d = container_of(head, struct domain, rcu);
>>      struct vcpu *v;
>> -    int i;
>> +    unsigned int i;
>>  
>>      /*
>>       * Flush all state for the vCPU previously having run on the current CPU.
>> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
>>       */
>>      sync_local_execstate();
>>  
>> -    for ( i = d->max_vcpus - 1; i >= 0; i-- )
>> +    for ( i = d->max_vcpus; i-- > 0; )
> 
> Is there any reason we need to do those loops backwards?
> 
> I would rather do:
> 
> for ( i = 0; i < d->max_vcpus; i++ )
> 
> ?

I think it's better to keep like this. The latter of the loops would better
clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
which assumes that for ordinary domains d->vcpu[] is populated contiguously.
Hardly any code should touch the vCPU-s of a domain destructed this far, but
still better safe than sorry, I guess.

In no case would I like to mix both changes.

Jan

Re: [PATCH] domain: use unsigned loop induction variable in complete_domain_destroy()
Posted by Roger Pau Monné 3 months, 1 week ago
On Wed, Mar 04, 2026 at 04:48:07PM +0100, Jan Beulich wrote:
> On 04.03.2026 16:38, Roger Pau Monné wrote:
> > On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
> >> --- a/xen/common/domain.c
> >> +++ b/xen/common/domain.c
> >> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
> >>  {
> >>      struct domain *d = container_of(head, struct domain, rcu);
> >>      struct vcpu *v;
> >> -    int i;
> >> +    unsigned int i;
> >>  
> >>      /*
> >>       * Flush all state for the vCPU previously having run on the current CPU.
> >> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
> >>       */
> >>      sync_local_execstate();
> >>  
> >> -    for ( i = d->max_vcpus - 1; i >= 0; i-- )
> >> +    for ( i = d->max_vcpus; i-- > 0; )
> > 
> > Is there any reason we need to do those loops backwards?
> > 
> > I would rather do:
> > 
> > for ( i = 0; i < d->max_vcpus; i++ )
> > 
> > ?
> 
> I think it's better to keep like this. The latter of the loops would better
> clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
> which assumes that for ordinary domains d->vcpu[] is populated contiguously.
> Hardly any code should touch the vCPU-s of a domain destructed this far, but
> still better safe than sorry, I guess.

Yes, you are right.  sched_destroy_vcpu() relies on this specific
top-down calling.

Since you are adjusting the code anyway, it might be worth writing
down that some functions (like sched_destroy_vcpu()) expect to be
called with a top-down order of vCPUs.

For the change itself:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

Re: [PATCH] domain: use unsigned loop induction variable in complete_domain_destroy()
Posted by Jan Beulich 3 months, 1 week ago
On 04.03.2026 18:36, Roger Pau Monné wrote:
> On Wed, Mar 04, 2026 at 04:48:07PM +0100, Jan Beulich wrote:
>> On 04.03.2026 16:38, Roger Pau Monné wrote:
>>> On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
>>>>  {
>>>>      struct domain *d = container_of(head, struct domain, rcu);
>>>>      struct vcpu *v;
>>>> -    int i;
>>>> +    unsigned int i;
>>>>  
>>>>      /*
>>>>       * Flush all state for the vCPU previously having run on the current CPU.
>>>> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
>>>>       */
>>>>      sync_local_execstate();
>>>>  
>>>> -    for ( i = d->max_vcpus - 1; i >= 0; i-- )
>>>> +    for ( i = d->max_vcpus; i-- > 0; )
>>>
>>> Is there any reason we need to do those loops backwards?
>>>
>>> I would rather do:
>>>
>>> for ( i = 0; i < d->max_vcpus; i++ )
>>>
>>> ?
>>
>> I think it's better to keep like this. The latter of the loops would better
>> clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
>> which assumes that for ordinary domains d->vcpu[] is populated contiguously.
>> Hardly any code should touch the vCPU-s of a domain destructed this far, but
>> still better safe than sorry, I guess.
> 
> Yes, you are right.  sched_destroy_vcpu() relies on this specific
> top-down calling.
> 
> Since you are adjusting the code anyway, it might be worth writing
> down that some functions (like sched_destroy_vcpu()) expect to be
> called with a top-down order of vCPUs.

I've added

    /*
     * Iterating downwards is a requirement here, as e.g. sched_destroy_vcpu()
     * relies on this.
     */

ahead of the first of the two loops.

> For the change itself:
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

Jan

Re: [PATCH] domain: use unsigned loop induction variable in complete_domain_destroy()
Posted by Roger Pau Monné 3 months, 1 week ago
On Thu, Mar 05, 2026 at 09:07:55AM +0100, Jan Beulich wrote:
> On 04.03.2026 18:36, Roger Pau Monné wrote:
> > On Wed, Mar 04, 2026 at 04:48:07PM +0100, Jan Beulich wrote:
> >> On 04.03.2026 16:38, Roger Pau Monné wrote:
> >>> On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
> >>>> --- a/xen/common/domain.c
> >>>> +++ b/xen/common/domain.c
> >>>> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
> >>>>  {
> >>>>      struct domain *d = container_of(head, struct domain, rcu);
> >>>>      struct vcpu *v;
> >>>> -    int i;
> >>>> +    unsigned int i;
> >>>>  
> >>>>      /*
> >>>>       * Flush all state for the vCPU previously having run on the current CPU.
> >>>> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
> >>>>       */
> >>>>      sync_local_execstate();
> >>>>  
> >>>> -    for ( i = d->max_vcpus - 1; i >= 0; i-- )
> >>>> +    for ( i = d->max_vcpus; i-- > 0; )
> >>>
> >>> Is there any reason we need to do those loops backwards?
> >>>
> >>> I would rather do:
> >>>
> >>> for ( i = 0; i < d->max_vcpus; i++ )
> >>>
> >>> ?
> >>
> >> I think it's better to keep like this. The latter of the loops would better
> >> clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
> >> which assumes that for ordinary domains d->vcpu[] is populated contiguously.
> >> Hardly any code should touch the vCPU-s of a domain destructed this far, but
> >> still better safe than sorry, I guess.
> > 
> > Yes, you are right.  sched_destroy_vcpu() relies on this specific
> > top-down calling.
> > 
> > Since you are adjusting the code anyway, it might be worth writing
> > down that some functions (like sched_destroy_vcpu()) expect to be
> > called with a top-down order of vCPUs.
> 
> I've added
> 
>     /*
>      * Iterating downwards is a requirement here, as e.g. sched_destroy_vcpu()
>      * relies on this.
>      */
> 
> ahead of the first of the two loops.

Thank you for adjusting that.

Roger.