[PATCH v1 0/2] domctl: Avoid taking domctl lock for certain ops used during migration

Ross Lagerwall posted 2 patches 3 days, 14 hours ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20260609151528.2426788-1-ross.lagerwall@citrix.com
xen/arch/x86/domctl.c    |  4 ++++
xen/arch/x86/mm/paging.c |  8 ++++++--
xen/common/domctl.c      | 13 +++++++++++++
3 files changed, 23 insertions(+), 2 deletions(-)
[PATCH v1 0/2] domctl: Avoid taking domctl lock for certain ops used during migration
Posted by Ross Lagerwall 3 days, 14 hours ago
When performing multiple migrations in parallel, the domctl lock may
become extremely contended:

* Operations like "xl vcpu-list" were observed to take in excess of 20s
  to execute.
* The "clean" shadow op may pause the domain, restart with a
  continuation and then become blocked on the domctl lock, causing VM
  downtime in excess of 20 seconds.

These issues can be fixed by not holding the domctl for the frequently
called operations during migration.

Thanks

Ross Lagerwall (2):
  domctl: Handle XEN_DOMCTL_getpageframeinfo3 without the domctl lock
  domctl: Handle some of XEN_DOMCTL_shadow_op without the domctl lock

 xen/arch/x86/domctl.c    |  4 ++++
 xen/arch/x86/mm/paging.c |  8 ++++++--
 xen/common/domctl.c      | 13 +++++++++++++
 3 files changed, 23 insertions(+), 2 deletions(-)

-- 
2.53.0
Re: [PATCH v1 0/2] domctl: Avoid taking domctl lock for certain ops used during migration
Posted by Jan Beulich 1 day, 15 hours ago
On 09.06.2026 17:15, Ross Lagerwall wrote:
> When performing multiple migrations in parallel, the domctl lock may
> become extremely contended:
> 
> * Operations like "xl vcpu-list" were observed to take in excess of 20s
>   to execute.

Does "xl vcpu-list" involve ...

> * The "clean" shadow op may pause the domain, restart with a
>   continuation and then become blocked on the domctl lock, causing VM
>   downtime in excess of 20 seconds.
> 
> These issues can be fixed by not holding the domctl for the frequently
> called operations during migration.
> 
> Thanks
> 
> Ross Lagerwall (2):
>   domctl: Handle XEN_DOMCTL_getpageframeinfo3 without the domctl lock

... XEN_DOMCTL_getpageframeinfo3?

Jan

>   domctl: Handle some of XEN_DOMCTL_shadow_op without the domctl lock
> 
>  xen/arch/x86/domctl.c    |  4 ++++
>  xen/arch/x86/mm/paging.c |  8 ++++++--
>  xen/common/domctl.c      | 13 +++++++++++++
>  3 files changed, 23 insertions(+), 2 deletions(-)
>
Re: [PATCH v1 0/2] domctl: Avoid taking domctl lock for certain ops used during migration
Posted by Ross Lagerwall 1 day, 14 hours ago
On 6/11/26 3:55 PM, Jan Beulich wrote:
> On 09.06.2026 17:15, Ross Lagerwall wrote:
>> When performing multiple migrations in parallel, the domctl lock may
>> become extremely contended:
>>
>> * Operations like "xl vcpu-list" were observed to take in excess of 20s
>>    to execute.
> 
> Does "xl vcpu-list" involve ...
> 
>> * The "clean" shadow op may pause the domain, restart with a
>>    continuation and then become blocked on the domctl lock, causing VM
>>    downtime in excess of 20 seconds.
>>
>> These issues can be fixed by not holding the domctl for the frequently
>> called operations during migration.
>>
>> Thanks
>>
>> Ross Lagerwall (2):
>>    domctl: Handle XEN_DOMCTL_getpageframeinfo3 without the domctl lock
> 
> ... XEN_DOMCTL_getpageframeinfo3?
> 

No, but "xl vcpu-list" takes the domctl lock and this contends with
XEN_DOMCTL_getpageframeinfo3 and XEN_DOMCTL_shadow_op taking the domctl lock
which are called frequently by the migration process(es).

Various other operations were slow due to the domctl lock contention but "xl
vcpu-list" was the most obviously visible example.

Ross
Re: [PATCH v1 0/2] domctl: Avoid taking domctl lock for certain ops used during migration
Posted by Jan Beulich 1 day, 14 hours ago
On 11.06.2026 18:02, Ross Lagerwall wrote:
> On 6/11/26 3:55 PM, Jan Beulich wrote:
>> On 09.06.2026 17:15, Ross Lagerwall wrote:
>>> When performing multiple migrations in parallel, the domctl lock may
>>> become extremely contended:
>>>
>>> * Operations like "xl vcpu-list" were observed to take in excess of 20s
>>>    to execute.
>>
>> Does "xl vcpu-list" involve ...
>>
>>> * The "clean" shadow op may pause the domain, restart with a
>>>    continuation and then become blocked on the domctl lock, causing VM
>>>    downtime in excess of 20 seconds.
>>>
>>> These issues can be fixed by not holding the domctl for the frequently
>>> called operations during migration.
>>>
>>> Thanks
>>>
>>> Ross Lagerwall (2):
>>>    domctl: Handle XEN_DOMCTL_getpageframeinfo3 without the domctl lock
>>
>> ... XEN_DOMCTL_getpageframeinfo3?
>>
> 
> No, but "xl vcpu-list" takes the domctl lock

If this is still the case after XSA-492, then maybe the follow-ups I have
pending to post will eliminate (or at least reduce) this. I don't think
that's 4.22 material, though.

> and this contends with
> XEN_DOMCTL_getpageframeinfo3 and XEN_DOMCTL_shadow_op taking the domctl lock
> which are called frequently by the migration process(es).
> 
> Various other operations were slow due to the domctl lock contention but "xl
> vcpu-list" was the most obviously visible example.

I see.

Jan

Re: [PATCH v1 0/2] domctl: Avoid taking domctl lock for certain ops used during migration
Posted by Ross Lagerwall 2 days, 20 hours ago
On 6/9/26 4:15 PM, Ross Lagerwall wrote:
> When performing multiple migrations in parallel, the domctl lock may
> become extremely contended:
> 
> * Operations like "xl vcpu-list" were observed to take in excess of 20s
>    to execute.
> * The "clean" shadow op may pause the domain, restart with a
>    continuation and then become blocked on the domctl lock, causing VM
>    downtime in excess of 20 seconds.
> 
> These issues can be fixed by not holding the domctl for the frequently
> called operations during migration.
> 
> Thanks
> 
> Ross Lagerwall (2):
>    domctl: Handle XEN_DOMCTL_getpageframeinfo3 without the domctl lock
>    domctl: Handle some of XEN_DOMCTL_shadow_op without the domctl lock
> 
>   xen/arch/x86/domctl.c    |  4 ++++
>   xen/arch/x86/mm/paging.c |  8 ++++++--
>   xen/common/domctl.c      | 13 +++++++++++++
>   3 files changed, 23 insertions(+), 2 deletions(-)
> 

I'd like to request inclusion of this in 4.22 since it fixes a real
customer issue we have observed and would have been posted some time ago
but was delayed to avoid drawing attention to and colliding with
XSA-492.

Thanks,
Ross
Re: [PATCH v1 0/2] domctl: Avoid taking domctl lock for certain ops used during migration
Posted by Oleksii Kurochko 2 days, 18 hours ago

On 6/10/26 11:57 AM, Ross Lagerwall wrote:
> On 6/9/26 4:15 PM, Ross Lagerwall wrote:
>> When performing multiple migrations in parallel, the domctl lock may
>> become extremely contended:
>>
>> * Operations like "xl vcpu-list" were observed to take in excess of 20s
>>    to execute.
>> * The "clean" shadow op may pause the domain, restart with a
>>    continuation and then become blocked on the domctl lock, causing VM
>>    downtime in excess of 20 seconds.
>>
>> These issues can be fixed by not holding the domctl for the frequently
>> called operations during migration.
>>
>> Thanks
>>
>> Ross Lagerwall (2):
>>    domctl: Handle XEN_DOMCTL_getpageframeinfo3 without the domctl lock
>>    domctl: Handle some of XEN_DOMCTL_shadow_op without the domctl lock
>>
>>   xen/arch/x86/domctl.c    |  4 ++++
>>   xen/arch/x86/mm/paging.c |  8 ++++++--
>>   xen/common/domctl.c      | 13 +++++++++++++
>>   3 files changed, 23 insertions(+), 2 deletions(-)
>>
> 
> I'd like to request inclusion of this in 4.22 since it fixes a real
> customer issue we have observed and would have been posted some time ago
> but was delayed to avoid drawing attention to and colliding with
> XSA-492.

Considering this and performance improvements it would be really nice to 
have in in 4.22:
  Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Thanks.

~ Oleksii