Using plain (signed) int variables as array indexes can be unhelpful on at
least x86, where the compiler may see the need to insert sign-extension
insns (strictly speaking it should be able to avoid that when the loop
continuation condition says >= 0, but that's not generally the case even
with gcc15).
Observed effects with gcc15 (will of course vary with compiler version and
level of optimization):
- on x86, one less preserved register in use, yet due to sub-optimal
choice of register variables still a small code size increase (%r12
isn't a good choice when it's used for base-without-index addressing, as
it requires a SIB byte which other registers wouldn't require),
- on Arm64 code size decreases, albeit that's eaten up by padding which is
being inserted ahead of a few labels,
- on Arm32 code size increases for a reason I didn't fully understand (my
ability to read Arm assembly is still somewhat limited).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
{
struct domain *d = container_of(head, struct domain, rcu);
struct vcpu *v;
- int i;
+ unsigned int i;
/*
* Flush all state for the vCPU previously having run on the current CPU.
@@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
*/
sync_local_execstate();
- for ( i = d->max_vcpus - 1; i >= 0; i-- )
+ for ( i = d->max_vcpus; i-- > 0; )
{
if ( (v = d->vcpu[i]) == NULL )
continue;
@@ -1511,7 +1511,7 @@ static void cf_check complete_domain_des
xfree(d->vm_event_share);
#endif
- for ( i = d->max_vcpus - 1; i >= 0; i-- )
+ for ( i = d->max_vcpus; i-- > 0; )
if ( (v = d->vcpu[i]) != NULL )
vcpu_destroy(v);
On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
> Using plain (signed) int variables as array indexes can be unhelpful on at
> least x86, where the compiler may see the need to insert sign-extension
> insns (strictly speaking it should be able to avoid that when the loop
> continuation condition says >= 0, but that's not generally the case even
> with gcc15).
>
> Observed effects with gcc15 (will of course vary with compiler version and
> level of optimization):
> - on x86, one less preserved register in use, yet due to sub-optimal
> choice of register variables still a small code size increase (%r12
> isn't a good choice when it's used for base-without-index addressing, as
> it requires a SIB byte which other registers wouldn't require),
> - on Arm64 code size decreases, albeit that's eaten up by padding which is
> being inserted ahead of a few labels,
> - on Arm32 code size increases for a reason I didn't fully understand (my
> ability to read Arm assembly is still somewhat limited).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
> {
> struct domain *d = container_of(head, struct domain, rcu);
> struct vcpu *v;
> - int i;
> + unsigned int i;
>
> /*
> * Flush all state for the vCPU previously having run on the current CPU.
> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
> */
> sync_local_execstate();
>
> - for ( i = d->max_vcpus - 1; i >= 0; i-- )
> + for ( i = d->max_vcpus; i-- > 0; )
Is there any reason we need to do those loops backwards?
I would rather do:
for ( i = 0; i < d->max_vcpus; i++ )
?
Thanks, Roger.
On 04.03.2026 16:38, Roger Pau Monné wrote:
> On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
>> {
>> struct domain *d = container_of(head, struct domain, rcu);
>> struct vcpu *v;
>> - int i;
>> + unsigned int i;
>>
>> /*
>> * Flush all state for the vCPU previously having run on the current CPU.
>> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
>> */
>> sync_local_execstate();
>>
>> - for ( i = d->max_vcpus - 1; i >= 0; i-- )
>> + for ( i = d->max_vcpus; i-- > 0; )
>
> Is there any reason we need to do those loops backwards?
>
> I would rather do:
>
> for ( i = 0; i < d->max_vcpus; i++ )
>
> ?
I think it's better to keep like this. The latter of the loops would better
clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
which assumes that for ordinary domains d->vcpu[] is populated contiguously.
Hardly any code should touch the vCPU-s of a domain destructed this far, but
still better safe than sorry, I guess.
In no case would I like to mix both changes.
Jan
On Wed, Mar 04, 2026 at 04:48:07PM +0100, Jan Beulich wrote:
> On 04.03.2026 16:38, Roger Pau Monné wrote:
> > On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
> >> --- a/xen/common/domain.c
> >> +++ b/xen/common/domain.c
> >> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
> >> {
> >> struct domain *d = container_of(head, struct domain, rcu);
> >> struct vcpu *v;
> >> - int i;
> >> + unsigned int i;
> >>
> >> /*
> >> * Flush all state for the vCPU previously having run on the current CPU.
> >> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
> >> */
> >> sync_local_execstate();
> >>
> >> - for ( i = d->max_vcpus - 1; i >= 0; i-- )
> >> + for ( i = d->max_vcpus; i-- > 0; )
> >
> > Is there any reason we need to do those loops backwards?
> >
> > I would rather do:
> >
> > for ( i = 0; i < d->max_vcpus; i++ )
> >
> > ?
>
> I think it's better to keep like this. The latter of the loops would better
> clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
> which assumes that for ordinary domains d->vcpu[] is populated contiguously.
> Hardly any code should touch the vCPU-s of a domain destructed this far, but
> still better safe than sorry, I guess.
Yes, you are right. sched_destroy_vcpu() relies on this specific
top-down calling.
Since you are adjusting the code anyway, it might be worth writing
down that some functions (like sched_destroy_vcpu()) expect to be
called with a top-down order of vCPUs.
For the change itself:
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, Roger.
On 04.03.2026 18:36, Roger Pau Monné wrote:
> On Wed, Mar 04, 2026 at 04:48:07PM +0100, Jan Beulich wrote:
>> On 04.03.2026 16:38, Roger Pau Monné wrote:
>>> On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
>>>> {
>>>> struct domain *d = container_of(head, struct domain, rcu);
>>>> struct vcpu *v;
>>>> - int i;
>>>> + unsigned int i;
>>>>
>>>> /*
>>>> * Flush all state for the vCPU previously having run on the current CPU.
>>>> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
>>>> */
>>>> sync_local_execstate();
>>>>
>>>> - for ( i = d->max_vcpus - 1; i >= 0; i-- )
>>>> + for ( i = d->max_vcpus; i-- > 0; )
>>>
>>> Is there any reason we need to do those loops backwards?
>>>
>>> I would rather do:
>>>
>>> for ( i = 0; i < d->max_vcpus; i++ )
>>>
>>> ?
>>
>> I think it's better to keep like this. The latter of the loops would better
>> clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
>> which assumes that for ordinary domains d->vcpu[] is populated contiguously.
>> Hardly any code should touch the vCPU-s of a domain destructed this far, but
>> still better safe than sorry, I guess.
>
> Yes, you are right. sched_destroy_vcpu() relies on this specific
> top-down calling.
>
> Since you are adjusting the code anyway, it might be worth writing
> down that some functions (like sched_destroy_vcpu()) expect to be
> called with a top-down order of vCPUs.
I've added
/*
* Iterating downwards is a requirement here, as e.g. sched_destroy_vcpu()
* relies on this.
*/
ahead of the first of the two loops.
> For the change itself:
>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks.
Jan
On Thu, Mar 05, 2026 at 09:07:55AM +0100, Jan Beulich wrote:
> On 04.03.2026 18:36, Roger Pau Monné wrote:
> > On Wed, Mar 04, 2026 at 04:48:07PM +0100, Jan Beulich wrote:
> >> On 04.03.2026 16:38, Roger Pau Monné wrote:
> >>> On Thu, Feb 26, 2026 at 10:01:45AM +0100, Jan Beulich wrote:
> >>>> --- a/xen/common/domain.c
> >>>> +++ b/xen/common/domain.c
> >>>> @@ -1475,7 +1475,7 @@ static void cf_check complete_domain_des
> >>>> {
> >>>> struct domain *d = container_of(head, struct domain, rcu);
> >>>> struct vcpu *v;
> >>>> - int i;
> >>>> + unsigned int i;
> >>>>
> >>>> /*
> >>>> * Flush all state for the vCPU previously having run on the current CPU.
> >>>> @@ -1485,7 +1485,7 @@ static void cf_check complete_domain_des
> >>>> */
> >>>> sync_local_execstate();
> >>>>
> >>>> - for ( i = d->max_vcpus - 1; i >= 0; i-- )
> >>>> + for ( i = d->max_vcpus; i-- > 0; )
> >>>
> >>> Is there any reason we need to do those loops backwards?
> >>>
> >>> I would rather do:
> >>>
> >>> for ( i = 0; i < d->max_vcpus; i++ )
> >>>
> >>> ?
> >>
> >> I think it's better to keep like this. The latter of the loops would better
> >> clear d->vcpu[i] (to not leave a dangling pointer), and there may be code
> >> which assumes that for ordinary domains d->vcpu[] is populated contiguously.
> >> Hardly any code should touch the vCPU-s of a domain destructed this far, but
> >> still better safe than sorry, I guess.
> >
> > Yes, you are right. sched_destroy_vcpu() relies on this specific
> > top-down calling.
> >
> > Since you are adjusting the code anyway, it might be worth writing
> > down that some functions (like sched_destroy_vcpu()) expect to be
> > called with a top-down order of vCPUs.
>
> I've added
>
> /*
> * Iterating downwards is a requirement here, as e.g. sched_destroy_vcpu()
> * relies on this.
> */
>
> ahead of the first of the two loops.
Thank you for adjusting that.
Roger.
© 2016 - 2026 Red Hat, Inc.