The label used upon the function failing is wrong. Instead of correcting
the label, move the invocation up a little, to also avoid it altogether
for the idle domain (where ->vmtrace_size would be zero, and hence the
function would bail right away anyway).
Fixes: 217dd79ee292 ("xen/domain: Add vmtrace_size domain creation parameter")
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -493,14 +493,14 @@ struct vcpu *vcpu_create(struct domain *
set_bit(_VPF_down, &v->pause_flags);
vcpu_info_reset(v);
init_waitqueue_vcpu(v);
+
+ if ( vmtrace_alloc_buffer(v) != 0 )
+ goto fail_wq;
}
if ( sched_init_vcpu(v) != 0 )
goto fail_wq;
- if ( vmtrace_alloc_buffer(v) != 0 )
- goto fail_wq;
-
if ( arch_vcpu_create(v) != 0 )
goto fail_sched;
On Mon, Feb 16, 2026 at 04:51:53PM +0100, Jan Beulich wrote:
> The label used upon the function failing is wrong. Instead of correcting
> the label, move the invocation up a little, to also avoid it altogether
> for the idle domain (where ->vmtrace_size would be zero, and hence the
> function would bail right away anyway).
>
> Fixes: 217dd79ee292 ("xen/domain: Add vmtrace_size domain creation parameter")
> Reported-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, Roger.
On 16/02/2026 3:51 pm, Jan Beulich wrote:
> The label used upon the function failing is wrong.
Is it? Which label do you think it ought to be?
> Instead of correcting
> the label, move the invocation up a little, to also avoid it altogether
> for the idle domain (where ->vmtrace_size would be zero, and hence the
> function would bail right away anyway).
>
> Fixes: 217dd79ee292 ("xen/domain: Add vmtrace_size domain creation parameter")
> Reported-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -493,14 +493,14 @@ struct vcpu *vcpu_create(struct domain *
> set_bit(_VPF_down, &v->pause_flags);
> vcpu_info_reset(v);
> init_waitqueue_vcpu(v);
> +
> + if ( vmtrace_alloc_buffer(v) != 0 )
> + goto fail_wq;
> }
>
> if ( sched_init_vcpu(v) != 0 )
> goto fail_wq;
>
> - if ( vmtrace_alloc_buffer(v) != 0 )
> - goto fail_wq;
> -
> if ( arch_vcpu_create(v) != 0 )
> goto fail_sched;
>
The positioning was intentional. I just didn't get to wiring up Intel
PT for Xen context yet, and tying the buffer to the idle vCPU would be
the obvious choice there.
The chances of getting around to that are admittedly low, so I don't
mind moving the call in principle (noting that this is a wish that
likely won't materialise), but the claim over the label needs resolving.
~Andrew
On 16.02.2026 17:29, Andrew Cooper wrote:
> On 16/02/2026 3:51 pm, Jan Beulich wrote:
>> The label used upon the function failing is wrong.
>
> Is it? Which label do you think it ought to be?
fail_sched, as Roger did point out in reply to the original other patch.
After all ...
>> Instead of correcting
>> the label, move the invocation up a little, to also avoid it altogether
>> for the idle domain (where ->vmtrace_size would be zero, and hence the
>> function would bail right away anyway).
>>
>> Fixes: 217dd79ee292 ("xen/domain: Add vmtrace_size domain creation parameter")
>> Reported-by: Roger Pau Monné <roger.pau@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -493,14 +493,14 @@ struct vcpu *vcpu_create(struct domain *
>> set_bit(_VPF_down, &v->pause_flags);
>> vcpu_info_reset(v);
>> init_waitqueue_vcpu(v);
>> +
>> + if ( vmtrace_alloc_buffer(v) != 0 )
>> + goto fail_wq;
>> }
>>
>> if ( sched_init_vcpu(v) != 0 )
>> goto fail_wq;
... this comes first, and ...
>> - if ( vmtrace_alloc_buffer(v) != 0 )
>> - goto fail_wq;
>> -
>> if ( arch_vcpu_create(v) != 0 )
>> goto fail_sched;
... here the correct label is used.
Jan
> The positioning was intentional. I just didn't get to wiring up Intel
> PT for Xen context yet, and tying the buffer to the idle vCPU would be
> the obvious choice there.
>
> The chances of getting around to that are admittedly low, so I don't
> mind moving the call in principle (noting that this is a wish that
> likely won't materialise), but the claim over the label needs resolving.
>
> ~Andrew
On 16/02/2026 4:39 pm, Jan Beulich wrote:
> On 16.02.2026 17:29, Andrew Cooper wrote:
>> On 16/02/2026 3:51 pm, Jan Beulich wrote:
>>> The label used upon the function failing is wrong.
>> Is it? Which label do you think it ought to be?
> fail_sched, as Roger did point out in reply to the original other patch.
> After all ...
>
>>> Instead of correcting
>>> the label, move the invocation up a little, to also avoid it altogether
>>> for the idle domain (where ->vmtrace_size would be zero, and hence the
>>> function would bail right away anyway).
>>>
>>> Fixes: 217dd79ee292 ("xen/domain: Add vmtrace_size domain creation parameter")
>>> Reported-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -493,14 +493,14 @@ struct vcpu *vcpu_create(struct domain *
>>> set_bit(_VPF_down, &v->pause_flags);
>>> vcpu_info_reset(v);
>>> init_waitqueue_vcpu(v);
>>> +
>>> + if ( vmtrace_alloc_buffer(v) != 0 )
>>> + goto fail_wq;
>>> }
>>>
>>> if ( sched_init_vcpu(v) != 0 )
>>> goto fail_wq;
> ... this comes first, and ...
>
>>> - if ( vmtrace_alloc_buffer(v) != 0 )
>>> - goto fail_wq;
>>> -
>>> if ( arch_vcpu_create(v) != 0 )
>>> goto fail_sched;
> ... here the correct label is used.
Eww, yes. So multiple observations.
1) This only functions in the first place because
destroy_waitqueue_vcpu() is idempotent to v->waitqueue_vcpu being NULL
which covers the idle case where init_waitqueue_vcpu() was never called.
2) sched_destroy_vcpu() can be made idempotent against v->sched_unit.
Then we don't need multiple labels and this all gets a lot easier to
untangle.
~Andrew
On 16.02.2026 18:17, Andrew Cooper wrote:
> On 16/02/2026 4:39 pm, Jan Beulich wrote:
>> On 16.02.2026 17:29, Andrew Cooper wrote:
>>> On 16/02/2026 3:51 pm, Jan Beulich wrote:
>>>> The label used upon the function failing is wrong.
>>> Is it? Which label do you think it ought to be?
>> fail_sched, as Roger did point out in reply to the original other patch.
>> After all ...
>>
>>>> Instead of correcting
>>>> the label, move the invocation up a little, to also avoid it altogether
>>>> for the idle domain (where ->vmtrace_size would be zero, and hence the
>>>> function would bail right away anyway).
>>>>
>>>> Fixes: 217dd79ee292 ("xen/domain: Add vmtrace_size domain creation parameter")
>>>> Reported-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -493,14 +493,14 @@ struct vcpu *vcpu_create(struct domain *
>>>> set_bit(_VPF_down, &v->pause_flags);
>>>> vcpu_info_reset(v);
>>>> init_waitqueue_vcpu(v);
>>>> +
>>>> + if ( vmtrace_alloc_buffer(v) != 0 )
>>>> + goto fail_wq;
>>>> }
>>>>
>>>> if ( sched_init_vcpu(v) != 0 )
>>>> goto fail_wq;
>> ... this comes first, and ...
>>
>>>> - if ( vmtrace_alloc_buffer(v) != 0 )
>>>> - goto fail_wq;
>>>> -
>>>> if ( arch_vcpu_create(v) != 0 )
>>>> goto fail_sched;
>> ... here the correct label is used.
>
> Eww, yes. So multiple observations.
>
> 1) This only functions in the first place because
> destroy_waitqueue_vcpu() is idempotent to v->waitqueue_vcpu being NULL
> which covers the idle case where init_waitqueue_vcpu() was never called.
>
> 2) sched_destroy_vcpu() can be made idempotent against v->sched_unit.
>
> Then we don't need multiple labels and this all gets a lot easier to
> untangle.
Yes, but as a backportable fix what I have here is the most suitable
first step, I'd say.
With what you suggest, I'd then want to check whether either or both of
the function invocations could move into vcpu_teardown(). At least for
destroy_waitqueue_vcpu() I can't really figure why it's called only in
complete_domain_destroy(); for sched_destroy_vcpu() it may well be that
it can't be done earlier. Or wait, looks like vm_event_cleanup() would
need moving up in domain_kill(). The comment there right now explains
why it can't be done later; it's not quite clear to me whether moving
it ahead of (or into) domain_teardown() would introduce any problems.
Tamas?
Jan
© 2016 - 2026 Red Hat, Inc.