config/Tools.mk.in | 1 + docs/misc/cpu-hotplug.txt | 51 ++++++++++++++++++++++++++++++++ tools/configure | 30 +++++++++++++++++++ tools/configure.ac | 1 + tools/libs/guest/Makefile.common | 4 +++ tools/misc/Makefile | 2 +- xen/arch/arm/gic.c | 11 +++++-- xen/arch/arm/sysctl.c | 45 ++++++++++++++++++++++++++++ xen/arch/arm/time.c | 21 ++++++++++--- 9 files changed, 159 insertions(+), 7 deletions(-) create mode 100644 docs/misc/cpu-hotplug.txt mode change 100755 => 100644 tools/configure
This series implements support for CPU hotplug/unplug on Arm. To achieve this, several things need to be done: 1. XEN_SYSCTL_CPU_HOTPLUG_* calls implemented. 2. timer and GIC maintenance interrupts switched to static irqactions to remove the need for freeing them during release_irq. 3. Enabled the build of xen-hptool on Arm. Tested on QEMU. v2->v3: * add docs v1->v2: * see individual patches Mykyta Poturai (5): arm/time: Use static irqaction arm/gic: Use static irqaction arm/sysctl: Implement cpu hotplug ops tools: Allow building xen-hptool without CONFIG_MIGRATE docs: Document CPU hotplug config/Tools.mk.in | 1 + docs/misc/cpu-hotplug.txt | 51 ++++++++++++++++++++++++++++++++ tools/configure | 30 +++++++++++++++++++ tools/configure.ac | 1 + tools/libs/guest/Makefile.common | 4 +++ tools/misc/Makefile | 2 +- xen/arch/arm/gic.c | 11 +++++-- xen/arch/arm/sysctl.c | 45 ++++++++++++++++++++++++++++ xen/arch/arm/time.c | 21 ++++++++++--- 9 files changed, 159 insertions(+), 7 deletions(-) create mode 100644 docs/misc/cpu-hotplug.txt mode change 100755 => 100644 tools/configure -- 2.34.1
Hi Mykyta, Thanks for the series. It seems there might be issues here -- please take a look and let me know if my concerns are valid: 1. FF-A notification IRQ: after a CPU down->up cycle the IRQ configuration may be lost. 2. GICv3 LPIs: a CPU may fail to come back up unless its LPI pending table exists (is allocated) on bring-up. See gicv3_lpi_allocate_pendtable() and its call chain. 3. IRQ migration on CPU down: if an IRQ targets a CPU being offlined, its affinity should be moved to an online CPU before completing the offlining. 4. Race between the new hypercalls and disable/enable_nonboot_cpus(): disable_nonboot_cpus is called, enable_nonboot_cpus() reads frozen_cpus, and before it calls cpu_up() a hypercall onlines the CPU. cpu_up() then fails as "already online", but the CPU_RESUME_FAILED path may still run for an already-online CPU, risking use-after-free of per-CPU state (e.g. via free_percpu_area()) and other issues related to CPU_RESUME_FAILED notification. On Fri, Oct 10, 2025 at 12:36 PM Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: > > This series implements support for CPU hotplug/unplug on Arm. To achieve this, > several things need to be done: > > 1. XEN_SYSCTL_CPU_HOTPLUG_* calls implemented. > 2. timer and GIC maintenance interrupts switched to static irqactions to remove > the need for freeing them during release_irq. > 3. Enabled the build of xen-hptool on Arm. > > Tested on QEMU. > > v2->v3: > * add docs > > v1->v2: > * see individual patches > > Mykyta Poturai (5): > arm/time: Use static irqaction > arm/gic: Use static irqaction > arm/sysctl: Implement cpu hotplug ops > tools: Allow building xen-hptool without CONFIG_MIGRATE > docs: Document CPU hotplug > > config/Tools.mk.in | 1 + > docs/misc/cpu-hotplug.txt | 51 ++++++++++++++++++++++++++++++++ > tools/configure | 30 +++++++++++++++++++ > tools/configure.ac | 1 + > tools/libs/guest/Makefile.common | 4 +++ > tools/misc/Makefile | 2 +- > xen/arch/arm/gic.c | 11 +++++-- > xen/arch/arm/sysctl.c | 45 ++++++++++++++++++++++++++++ > xen/arch/arm/time.c | 21 ++++++++++--- > 9 files changed, 159 insertions(+), 7 deletions(-) > create mode 100644 docs/misc/cpu-hotplug.txt > mode change 100755 => 100644 tools/configure > > -- > 2.34.1 > Best regards, Mykola
On 15.10.25 20:30, Mykola Kvach wrote: > Hi Mykyta, > > Thanks for the series. > > It seems there might be issues here -- please take a look and let me > know if my concerns are valid: > > 1. FF-A notification IRQ: after a CPU down->up cycle the IRQ > configuration may be lost. OPTEE and FFA are marked as unsupported. > 2. GICv3 LPIs: a CPU may fail to come back up unless its LPI pending > table exists (is allocated) on bring-up. See > gicv3_lpi_allocate_pendtable() and its call chain. ITS is marked as unsupported. I have a plan to deal with this, but it is out of scope of this series. > 3. IRQ migration on CPU down: if an IRQ targets a CPU being offlined, > its affinity should be moved to an online CPU before completing the > offlining. All guest tied IRQ migration is handled by the scheduler. Regarding the irqs used by Xen, I didn't find any with affinity to other CPUs than CPU 0, which can't be disabled. I think theoretically it is possible for them to have different affinity, but it seems unlikely considering that x86 hotplug code also doesn't seem to do any Xen irq migration AFAIU. > 4. Race between the new hypercalls and disable/enable_nonboot_cpus(): > disable_nonboot_cpus is called, enable_nonboot_cpus() reads > frozen_cpus, and before it calls cpu_up() a hypercall onlines the CPU. > cpu_up() then fails as "already online", but the CPU_RESUME_FAILED > path may still run for an already-online CPU, risking use-after-free > of per-CPU state (e.g. via free_percpu_area()) and other issues > related to CPU_RESUME_FAILED notification. > There don't seem to be any calls to disable/enable_nonboot_cpus() on Arm. If we take x86 as an example, then they are called with all domains already paused, and I don't see how paused domains can issue hypercalls. > > Best regards, > Mykola -- Mykyta
Hi Mykyta, Thank you for your answers On Mon, Oct 20, 2025 at 5:15 PM Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: > > On 15.10.25 20:30, Mykola Kvach wrote: > > Hi Mykyta, > > > > Thanks for the series. > > > > It seems there might be issues here -- please take a look and let me > > know if my concerns are valid: > > > > 1. FF-A notification IRQ: after a CPU down->up cycle the IRQ > > configuration may be lost. > > OPTEE and FFA are marked as unsupported. Understood, thanks. Would it be worth documenting this? > > > 2. GICv3 LPIs: a CPU may fail to come back up unless its LPI pending > > table exists (is allocated) on bring-up. See > > gicv3_lpi_allocate_pendtable() and its call chain. > > ITS is marked as unsupported. I have a plan to deal with this, but it is > out of scope of this series. Thanks for the clarification. Should we document this somewhere? > > > 3. IRQ migration on CPU down: if an IRQ targets a CPU being offlined, > > its affinity should be moved to an online CPU before completing the > > offlining. > > All guest tied IRQ migration is handled by the scheduler. Regarding the > irqs used by Xen, I didn't find any with affinity to other CPUs than CPU > 0, which can't be disabled. I think theoretically it is possible for > them to have different affinity, but it seems unlikely considering that > x86 hotplug code also doesn't seem to do any Xen irq migration AFAIU. What about arm_smmu_init_domain_context and its related call chains? As far as I can see, some of these paths touch XEN_DOMCTL_* hypercalls, and my understanding is they can be issued on any CPU. Should we add a check that no enabled (e)SPIs owned by Xen are pinned to the offlining CPU? > > > 4. Race between the new hypercalls and disable/enable_nonboot_cpus(): > > disable_nonboot_cpus is called, enable_nonboot_cpus() reads > > frozen_cpus, and before it calls cpu_up() a hypercall onlines the CPU. > > cpu_up() then fails as "already online", but the CPU_RESUME_FAILED > > path may still run for an already-online CPU, risking use-after-free > > of per-CPU state (e.g. via free_percpu_area()) and other issues > > related to CPU_RESUME_FAILED notification. > > > > There don't seem to be any calls to disable/enable_nonboot_cpus() on > Arm. If we take x86 as an example, then they are called with all domains > already paused, and I don't see how paused domains can issue hypercalls. Agreed; this looks even less likely given that disable_* runs on CPU0 and your new hypercalls execute on CPU0. The only plausible issue would be a contrived case where code disables non-boot CPUs from CPU0 but enables them from another CPU woken by a hypercall. That seems unrealistic. > > > > > Best regards, > > Mykola > > -- > Mykyta Best regards, Mykola
© 2016 - 2025 Red Hat, Inc.