[PATCH v3 0/5] Implement CPU hotplug on Arm

Mykyta Poturai posted 5 patches 2 weeks, 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/cover.1760083684.git.mykyta._5Fpoturai@epam.com
config/Tools.mk.in               |  1 +
docs/misc/cpu-hotplug.txt        | 51 ++++++++++++++++++++++++++++++++
tools/configure                  | 30 +++++++++++++++++++
tools/configure.ac               |  1 +
tools/libs/guest/Makefile.common |  4 +++
tools/misc/Makefile              |  2 +-
xen/arch/arm/gic.c               | 11 +++++--
xen/arch/arm/sysctl.c            | 45 ++++++++++++++++++++++++++++
xen/arch/arm/time.c              | 21 ++++++++++---
9 files changed, 159 insertions(+), 7 deletions(-)
create mode 100644 docs/misc/cpu-hotplug.txt
mode change 100755 => 100644 tools/configure
[PATCH v3 0/5] Implement CPU hotplug on Arm
Posted by Mykyta Poturai 2 weeks, 6 days ago
This series implements support for CPU hotplug/unplug on Arm. To achieve this,
several things need to be done:

1. XEN_SYSCTL_CPU_HOTPLUG_* calls implemented.
2. timer and GIC maintenance interrupts switched to static irqactions to remove
the need for freeing them during release_irq.
3. Enabled the build of xen-hptool on Arm.

Tested on QEMU.

v2->v3:
* add docs

v1->v2:
* see individual patches

Mykyta Poturai (5):
  arm/time: Use static irqaction
  arm/gic: Use static irqaction
  arm/sysctl: Implement cpu hotplug ops
  tools: Allow building xen-hptool without CONFIG_MIGRATE
  docs: Document CPU hotplug

 config/Tools.mk.in               |  1 +
 docs/misc/cpu-hotplug.txt        | 51 ++++++++++++++++++++++++++++++++
 tools/configure                  | 30 +++++++++++++++++++
 tools/configure.ac               |  1 +
 tools/libs/guest/Makefile.common |  4 +++
 tools/misc/Makefile              |  2 +-
 xen/arch/arm/gic.c               | 11 +++++--
 xen/arch/arm/sysctl.c            | 45 ++++++++++++++++++++++++++++
 xen/arch/arm/time.c              | 21 ++++++++++---
 9 files changed, 159 insertions(+), 7 deletions(-)
 create mode 100644 docs/misc/cpu-hotplug.txt
 mode change 100755 => 100644 tools/configure

-- 
2.34.1
Re: [PATCH v3 0/5] Implement CPU hotplug on Arm
Posted by Mykola Kvach 2 weeks ago
Hi Mykyta,

Thanks for the series.

It seems there might be issues here -- please take a look and let me
know if my concerns are valid:

1. FF-A notification IRQ: after a CPU down->up cycle the IRQ
configuration may be lost.

2. GICv3 LPIs: a CPU may fail to come back up unless its LPI pending
table exists (is allocated) on bring-up. See
gicv3_lpi_allocate_pendtable() and its call chain.

3. IRQ migration on CPU down: if an IRQ targets a CPU being offlined,
its affinity should be moved to an online CPU before completing the
offlining.

4. Race between the new hypercalls and disable/enable_nonboot_cpus():
disable_nonboot_cpus is called, enable_nonboot_cpus() reads
frozen_cpus, and before it calls cpu_up() a hypercall onlines the CPU.
cpu_up() then fails as "already online", but the CPU_RESUME_FAILED
path may still run for an already-online CPU, risking use-after-free
of per-CPU state (e.g. via free_percpu_area()) and other issues
related to CPU_RESUME_FAILED notification.



On Fri, Oct 10, 2025 at 12:36 PM Mykyta Poturai <Mykyta_Poturai@epam.com> wrote:
>
> This series implements support for CPU hotplug/unplug on Arm. To achieve this,
> several things need to be done:
>
> 1. XEN_SYSCTL_CPU_HOTPLUG_* calls implemented.
> 2. timer and GIC maintenance interrupts switched to static irqactions to remove
> the need for freeing them during release_irq.
> 3. Enabled the build of xen-hptool on Arm.
>
> Tested on QEMU.
>
> v2->v3:
> * add docs
>
> v1->v2:
> * see individual patches
>
> Mykyta Poturai (5):
>   arm/time: Use static irqaction
>   arm/gic: Use static irqaction
>   arm/sysctl: Implement cpu hotplug ops
>   tools: Allow building xen-hptool without CONFIG_MIGRATE
>   docs: Document CPU hotplug
>
>  config/Tools.mk.in               |  1 +
>  docs/misc/cpu-hotplug.txt        | 51 ++++++++++++++++++++++++++++++++
>  tools/configure                  | 30 +++++++++++++++++++
>  tools/configure.ac               |  1 +
>  tools/libs/guest/Makefile.common |  4 +++
>  tools/misc/Makefile              |  2 +-
>  xen/arch/arm/gic.c               | 11 +++++--
>  xen/arch/arm/sysctl.c            | 45 ++++++++++++++++++++++++++++
>  xen/arch/arm/time.c              | 21 ++++++++++---
>  9 files changed, 159 insertions(+), 7 deletions(-)
>  create mode 100644 docs/misc/cpu-hotplug.txt
>  mode change 100755 => 100644 tools/configure
>
> --
> 2.34.1
>

Best regards,
Mykola
Re: [PATCH v3 0/5] Implement CPU hotplug on Arm
Posted by Mykyta Poturai 1 week, 3 days ago
On 15.10.25 20:30, Mykola Kvach wrote:
> Hi Mykyta,
> 
> Thanks for the series.
> 
> It seems there might be issues here -- please take a look and let me
> know if my concerns are valid:
> 
> 1. FF-A notification IRQ: after a CPU down->up cycle the IRQ
> configuration may be lost.

OPTEE and FFA are marked as unsupported.

> 2. GICv3 LPIs: a CPU may fail to come back up unless its LPI pending
> table exists (is allocated) on bring-up. See
> gicv3_lpi_allocate_pendtable() and its call chain.

ITS is marked as unsupported. I have a plan to deal with this, but it is 
out of scope of this series.

> 3. IRQ migration on CPU down: if an IRQ targets a CPU being offlined,
> its affinity should be moved to an online CPU before completing the
> offlining.

All guest tied IRQ migration is handled by the scheduler. Regarding the 
irqs used by Xen, I didn't find any with affinity to other CPUs than CPU 
0, which can't be disabled. I think theoretically it is possible for 
them to have different affinity, but it seems unlikely considering that 
x86 hotplug code also doesn't seem to do any Xen irq migration AFAIU.

> 4. Race between the new hypercalls and disable/enable_nonboot_cpus():
> disable_nonboot_cpus is called, enable_nonboot_cpus() reads
> frozen_cpus, and before it calls cpu_up() a hypercall onlines the CPU.
> cpu_up() then fails as "already online", but the CPU_RESUME_FAILED
> path may still run for an already-online CPU, risking use-after-free
> of per-CPU state (e.g. via free_percpu_area()) and other issues
> related to CPU_RESUME_FAILED notification.
> 

There don't seem to be any calls to disable/enable_nonboot_cpus() on 
Arm. If we take x86 as an example, then they are called with all domains 
already paused, and I don't see how paused domains can issue hypercalls.

> 
> Best regards,
> Mykola

-- 
Mykyta
Re: [PATCH v3 0/5] Implement CPU hotplug on Arm
Posted by Mykola Kvach 1 week, 2 days ago
Hi Mykyta,

Thank you for your answers

On Mon, Oct 20, 2025 at 5:15 PM Mykyta Poturai <Mykyta_Poturai@epam.com> wrote:
>
> On 15.10.25 20:30, Mykola Kvach wrote:
> > Hi Mykyta,
> >
> > Thanks for the series.
> >
> > It seems there might be issues here -- please take a look and let me
> > know if my concerns are valid:
> >
> > 1. FF-A notification IRQ: after a CPU down->up cycle the IRQ
> > configuration may be lost.
>
> OPTEE and FFA are marked as unsupported.

Understood, thanks. Would it be worth documenting this?

>
> > 2. GICv3 LPIs: a CPU may fail to come back up unless its LPI pending
> > table exists (is allocated) on bring-up. See
> > gicv3_lpi_allocate_pendtable() and its call chain.
>
> ITS is marked as unsupported. I have a plan to deal with this, but it is
> out of scope of this series.

Thanks for the clarification. Should we document this somewhere?

>
> > 3. IRQ migration on CPU down: if an IRQ targets a CPU being offlined,
> > its affinity should be moved to an online CPU before completing the
> > offlining.
>
> All guest tied IRQ migration is handled by the scheduler. Regarding the
> irqs used by Xen, I didn't find any with affinity to other CPUs than CPU
> 0, which can't be disabled. I think theoretically it is possible for
> them to have different affinity, but it seems unlikely considering that
> x86 hotplug code also doesn't seem to do any Xen irq migration AFAIU.

What about arm_smmu_init_domain_context and its related call chains?
As far as I can see, some of these paths touch XEN_DOMCTL_* hypercalls,
and my understanding is they can be issued on any CPU. Should we add a
check that no enabled (e)SPIs owned by Xen are pinned to the offlining
CPU?

>
> > 4. Race between the new hypercalls and disable/enable_nonboot_cpus():
> > disable_nonboot_cpus is called, enable_nonboot_cpus() reads
> > frozen_cpus, and before it calls cpu_up() a hypercall onlines the CPU.
> > cpu_up() then fails as "already online", but the CPU_RESUME_FAILED
> > path may still run for an already-online CPU, risking use-after-free
> > of per-CPU state (e.g. via free_percpu_area()) and other issues
> > related to CPU_RESUME_FAILED notification.
> >
>
> There don't seem to be any calls to disable/enable_nonboot_cpus() on
> Arm. If we take x86 as an example, then they are called with all domains
> already paused, and I don't see how paused domains can issue hypercalls.

Agreed; this looks even less likely given that disable_* runs on CPU0 and
your new hypercalls execute on CPU0. The only plausible issue would be a
contrived case where code disables non-boot CPUs from CPU0 but enables them
from another CPU woken by a hypercall. That seems unrealistic.

>
> >
> > Best regards,
> > Mykola
>
> --
> Mykyta

Best regards,
Mykola