:p
atchew
Login
This series implements support for CPU hotplug/unplug on Arm. To achieve this, several things need to be done: 1. XEN_SYSCTL_CPU_HOTPLUG_* calls implemented on Arm64. 2. Enabled building of xen-hptool. 3. Migration of irqs from dying CPUs implemented. Tested on QEMU. v4->v5: * drop merged patches * combine "smp: Move cpu_up/down helpers to common code" with "arm/sysctl: Implement cpu hotplug ops" * see individual patches v3->v4: * add irq migration patches * see individual patches v2->v3: * add docs v1->v2: * see individual patches Mykyta Poturai (5): arm/irq: Keep track of irq affinities arm/irq: Migrate IRQs during CPU up/down operations arm/sysctl: Implement cpu hotplug ops tools: Allow building xen-hptool without CONFIG_MIGRATE docs: Document CPU hotplug SUPPORT.md | 1 + docs/misc/cpu-hotplug.txt | 50 +++++++++++++++++++++++++ tools/libs/guest/Makefile.common | 2 +- tools/misc/Makefile | 2 +- xen/arch/arm/Kconfig | 1 + xen/arch/arm/gic-vgic.c | 2 + xen/arch/arm/include/asm/irq.h | 2 + xen/arch/arm/irq.c | 63 +++++++++++++++++++++++++++++++- xen/arch/arm/smp.c | 9 +++++ xen/arch/arm/smpboot.c | 6 +++ xen/arch/arm/vgic.c | 14 ++++++- xen/arch/ppc/stubs.c | 4 ++ xen/arch/riscv/stubs.c | 5 +++ xen/arch/x86/Kconfig | 1 + xen/arch/x86/include/asm/smp.h | 3 -- xen/arch/x86/smp.c | 33 ++--------------- xen/arch/x86/sysctl.c | 12 ++---- xen/common/Kconfig | 3 ++ xen/common/smp.c | 34 +++++++++++++++++ xen/common/sysctl.c | 45 +++++++++++++++++++++++ xen/include/xen/smp.h | 4 ++ 21 files changed, 248 insertions(+), 48 deletions(-) create mode 100644 docs/misc/cpu-hotplug.txt -- 2.51.2
Currently on Arm the desc->affinity mask of an irq is never updated, which makes it hard to know the actual affinity of an interrupt. Fix this by updating the field in irq_set_affinity. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v4->v5: * add locking v3->v4: * patch introduced --- xen/arch/arm/gic-vgic.c | 2 ++ xen/arch/arm/irq.c | 9 +++++++-- xen/arch/arm/vgic.c | 14 ++++++++++++-- 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -XXX,XX +XXX,XX @@ static void gic_update_one_lr(struct vcpu *v, int i) if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) { struct vcpu *v_target = vgic_get_target_vcpu(v, irq); + spin_lock(&p->desc->lock); irq_set_affinity(p->desc, cpumask_of(v_target->processor)); + spin_unlock(&p->desc->lock); clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status); } } diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -XXX,XX +XXX,XX @@ static inline struct domain *irq_get_domain(struct irq_desc *desc) return irq_get_guest_info(desc)->d; } +/* Must be called with desc->lock held */ void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask) { - if ( desc != NULL ) - desc->handler->set_affinity(desc, mask); + if ( desc == NULL ) + return; + + ASSERT(spin_is_locked(&desc->lock)); + cpumask_copy(desc->affinity, mask); + desc->handler->set_affinity(desc, mask); } int request_irq(unsigned int irq, unsigned int irqflags, diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -XXX,XX +XXX,XX @@ bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq) if ( list_empty(&p->inflight) ) { + spin_lock(&p->desc->lock); irq_set_affinity(p->desc, cpumask_of(new->processor)); + spin_unlock(&p->desc->lock); spin_unlock_irqrestore(&old->arch.vgic.lock, flags); return true; } @@ -XXX,XX +XXX,XX @@ bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq) if ( !list_empty(&p->lr_queue) ) { vgic_remove_irq_from_queues(old, p); + spin_lock(&p->desc->lock); irq_set_affinity(p->desc, cpumask_of(new->processor)); + spin_unlock(&p->desc->lock); spin_unlock_irqrestore(&old->arch.vgic.lock, flags); vgic_inject_irq(new->domain, new, irq, true); return true; @@ -XXX,XX +XXX,XX @@ void arch_move_irqs(struct vcpu *v) struct domain *d = v->domain; struct pending_irq *p; struct vcpu *v_target; + unsigned long flags; int i; /* @@ -XXX,XX +XXX,XX @@ void arch_move_irqs(struct vcpu *v) p = irq_to_pending(v_target, virq); if ( v_target == v && !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) + { + if ( !p->desc ) + continue; + spin_lock_irqsave(&p->desc->lock, flags); irq_set_affinity(p->desc, cpu_mask); + spin_unlock_irqrestore(&p->desc->lock, flags); + } } } @@ -XXX,XX +XXX,XX @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, unsigned int n) spin_unlock_irqrestore(&v_target->arch.vgic.lock, flags); if ( p->desc != NULL ) { - irq_set_affinity(p->desc, cpumask_of(v_target->processor)); spin_lock_irqsave(&p->desc->lock, flags); + irq_set_affinity(p->desc, cpumask_of(v_target->processor)); /* * The irq cannot be a PPI, we only support delivery of SPIs * to guests. @@ -XXX,XX +XXX,XX @@ void vgic_check_inflight_irqs_pending(struct vcpu *v, unsigned int rank, uint32_ * indent-tabs-mode: nil * End: */ - -- 2.51.2
Move IRQs from dying CPU to the online ones when a CPU is getting offlined. When onlining, rebalance all IRQs in a round-robin fashion. Guest-bound IRQs are already handled by scheduler in the process of moving vCPUs to active pCPUs, so we only need to handle IRQs used by Xen itself. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v4->v5: * handle CPU onlining as well * more comments * fix crash when ESPI is disabled * don't assume CPU 0 is a boot CPU * use insigned int for irq number * remove assumption that all irqs a bound to CPU 0 by default from the commit message v3->v4: * patch introduced --- xen/arch/arm/include/asm/irq.h | 2 ++ xen/arch/arm/irq.c | 54 ++++++++++++++++++++++++++++++++++ xen/arch/arm/smpboot.c | 6 ++++ 3 files changed, 62 insertions(+) diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/include/asm/irq.h +++ b/xen/arch/arm/include/asm/irq.h @@ -XXX,XX +XXX,XX @@ bool irq_type_set_by_domain(const struct domain *d); void irq_end_none(struct irq_desc *irq); #define irq_end_none irq_end_none +void rebalance_irqs(unsigned int from, bool up); + #endif /* _ASM_HW_IRQ_H */ /* * Local variables: diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -XXX,XX +XXX,XX @@ static int init_local_irq_data(unsigned int cpu) return 0; } +static int cpu_next; + +static void balance_irq(int irq, unsigned int from, bool up) +{ + struct irq_desc *desc = irq_to_desc(irq); + unsigned long flags; + + ASSERT(!cpumask_empty(&cpu_online_map)); + + spin_lock_irqsave(&desc->lock, flags); + if ( likely(!desc->action) ) + goto out; + + if ( likely(test_bit(_IRQ_GUEST, &desc->status) || + test_bit(_IRQ_MOVE_PENDING, &desc->status)) ) + goto out; + + /* + * Setting affinity to a mask of multiple CPUs causes the GIC drivers to + * select one CPU from that mask. If the dying CPU was included in the IRQ's + * affinity mask, we cannot determine exactly which CPU the interrupt is + * currently routed to, as GIC drivers lack a concrete get_affinity API. So + * to be safe we must reroute it to a new, definitely online, CPU. In the + * case of CPU going down, we move only the interrupt that could reside on + * it. Otherwise, we rearrange all interrupts in a round-robin fashion. + */ + if ( !up && !cpumask_test_cpu(from, desc->affinity) ) + goto out; + + cpu_next = cpumask_cycle(cpu_next, &cpu_online_map); + irq_set_affinity(desc, cpumask_of(cpu_next)); + +out: + spin_unlock_irqrestore(&desc->lock, flags); +} + +void rebalance_irqs(unsigned int from, bool up) +{ + int irq; + + if ( cpumask_empty(&cpu_online_map) ) + return; + + for ( irq = NR_LOCAL_IRQS; irq < NR_IRQS; irq++ ) + balance_irq(irq, from, up); + +#ifdef CONFIG_GICV3_ESPI + for ( irq = ESPI_BASE_INTID; irq < ESPI_MAX_INTID; irq++ ) + balance_irq(irq, from, up); +#endif +} + static int cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) { @@ -XXX,XX +XXX,XX @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action, printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n", cpu); break; + case CPU_ONLINE: + rebalance_irqs(cpu, true); } return notifier_from_errno(rc); diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -XXX,XX +XXX,XX @@ void __cpu_disable(void) smp_mb(); + /* + * Now that the interrupts are cleared and the CPU marked as offline, + * move interrupts out of it + */ + rebalance_irqs(cpu, false); + /* Return to caller; eventually the IPI mechanism will unwind and the * scheduler will drop to the idle loop, which will call stop_cpu(). */ } -- 2.51.2
Move XEN_SYSCTL_CPU_HOTPLUG_{ONLINE,OFFLINE} handlers to common code to allow for enabling/disabling CPU cores in runtime on Arm64. SMT-disable enforcement check is moved into a separate architecture-specific function. For now this operations only support Arm64. For proper Arm32 support, there needs to be a mechanism to free per-cpu page tables, allocated in init_domheap_mappings. Also, hotplug is not supported if ITS, FFA, or TEE is enabled, as they use non-static IRQ actions. Create a Kconfig option CPU_HOTPLUG that reflects this constraints. On X86 the option is enabled unconditionally. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v4->v5: * move handling to common code * rename config to CPU_HOTPUG * merge with "smp: Move cpu_up/down helpers to common code" v3->v4: * don't reimplement cpu_up/down helpers * add Kconfig option * fixup formatting v2->v3: * no changes v1->v2: * remove SMT ops * remove cpu == 0 checks * add XSM hooks * only implement for 64bit Arm --- xen/arch/arm/Kconfig | 1 + xen/arch/arm/smp.c | 9 +++++++ xen/arch/ppc/stubs.c | 4 +++ xen/arch/riscv/stubs.c | 5 ++++ xen/arch/x86/Kconfig | 1 + xen/arch/x86/include/asm/smp.h | 3 --- xen/arch/x86/smp.c | 33 +++---------------------- xen/arch/x86/sysctl.c | 12 +++------ xen/common/Kconfig | 3 +++ xen/common/smp.c | 34 +++++++++++++++++++++++++ xen/common/sysctl.c | 45 ++++++++++++++++++++++++++++++++++ xen/include/xen/smp.h | 4 +++ 12 files changed, 112 insertions(+), 42 deletions(-) diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -XXX,XX +XXX,XX @@ config ARM_64 def_bool y depends on !ARM_32 select 64BIT + select CPU_HOTPLUG if !TEE && !FFA && !HAS_ITS select HAS_FAST_MULTIPLY select HAS_VPCI_GUEST_SUPPORT if PCI_PASSTHROUGH diff --git a/xen/arch/arm/smp.c b/xen/arch/arm/smp.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/smp.c +++ b/xen/arch/arm/smp.c @@ -XXX,XX +XXX,XX @@ void smp_send_call_function_mask(const cpumask_t *mask) } } +/* + * We currently don't support SMT on ARM so we don't need any special logic for + * CPU disabling + */ +bool arch_smt_cpu_disable(unsigned int cpu) +{ + return false; +} + /* * Local variables: * mode: C diff --git a/xen/arch/ppc/stubs.c b/xen/arch/ppc/stubs.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/ppc/stubs.c +++ b/xen/arch/ppc/stubs.c @@ -XXX,XX +XXX,XX @@ void smp_send_call_function_mask(const cpumask_t *mask) BUG_ON("unimplemented"); } +bool arch_smt_cpu_disable(unsigned int cpu) +{ + BUG_ON("unimplemented"); +} /* irq.c */ void irq_ack_none(struct irq_desc *desc) diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/stubs.c +++ b/xen/arch/riscv/stubs.c @@ -XXX,XX +XXX,XX @@ void smp_send_call_function_mask(const cpumask_t *mask) BUG_ON("unimplemented"); } +bool arch_smt_cpu_disable(unsigned int cpu) +{ + BUG_ON("unimplemented"); +} + /* irq.c */ void irq_ack_none(struct irq_desc *desc) diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -XXX,XX +XXX,XX @@ config X86 select ARCH_PAGING_MEMPOOL select ARCH_SUPPORTS_INT128 imply CORE_PARKING + select CPU_HOTPLUG select FUNCTION_ALIGNMENT_16B select GENERIC_BUG_FRAME select HAS_ALTERNATIVE diff --git a/xen/arch/x86/include/asm/smp.h b/xen/arch/x86/include/asm/smp.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/include/asm/smp.h +++ b/xen/arch/x86/include/asm/smp.h @@ -XXX,XX +XXX,XX @@ int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm); void __stop_this_cpu(void); -long cf_check cpu_up_helper(void *data); -long cf_check cpu_down_helper(void *data); - long cf_check core_parking_helper(void *data); bool core_parking_remove(unsigned int cpu); uint32_t get_cur_idle_nums(void); diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/smp.c +++ b/xen/arch/x86/smp.c @@ -XXX,XX +XXX,XX @@ void cf_check call_function_interrupt(void) smp_call_function_interrupt(); } -long cf_check cpu_up_helper(void *data) +bool arch_smt_cpu_disable(unsigned int cpu) { - unsigned int cpu = (unsigned long)data; - int ret = cpu_up(cpu); - - /* Have one more go on EBUSY. */ - if ( ret == -EBUSY ) - ret = cpu_up(cpu); - - if ( !ret && !opt_smt && - cpu_data[cpu].compute_unit_id == INVALID_CUID && - cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) > 1 ) - { - ret = cpu_down_helper(data); - if ( ret ) - printk("Could not re-offline CPU%u (%d)\n", cpu, ret); - else - ret = -EPERM; - } - - return ret; -} - -long cf_check cpu_down_helper(void *data) -{ - int cpu = (unsigned long)data; - int ret = cpu_down(cpu); - /* Have one more go on EBUSY. */ - if ( ret == -EBUSY ) - ret = cpu_down(cpu); - return ret; + return !opt_smt && cpu_data[cpu].compute_unit_id == INVALID_CUID && + cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) > 1; } diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -XXX,XX +XXX,XX @@ long arch_do_sysctl( case XEN_SYSCTL_cpu_hotplug: { - unsigned int cpu = sysctl->u.cpu_hotplug.cpu; unsigned int op = sysctl->u.cpu_hotplug.op; bool plug; long (*fn)(void *data); @@ -XXX,XX +XXX,XX @@ long arch_do_sysctl( switch ( op ) { case XEN_SYSCTL_CPU_HOTPLUG_ONLINE: - plug = true; - fn = cpu_up_helper; - hcpu = _p(cpu); - break; - case XEN_SYSCTL_CPU_HOTPLUG_OFFLINE: - plug = false; - fn = cpu_down_helper; - hcpu = _p(cpu); + /* Handled by common code */ + ASSERT_UNREACHABLE(); + ret = -EOPNOTSUPP; break; case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE: diff --git a/xen/common/Kconfig b/xen/common/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -XXX,XX +XXX,XX @@ config LIBFDT config MEM_ACCESS_ALWAYS_ON bool +config CPU_HOTPLUG + bool + config VM_EVENT def_bool MEM_ACCESS_ALWAYS_ON prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON diff --git a/xen/common/smp.c b/xen/common/smp.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/smp.c +++ b/xen/common/smp.c @@ -XXX,XX +XXX,XX @@ * GNU General Public License for more details. */ +#include <xen/cpu.h> #include <asm/hardirq.h> #include <asm/processor.h> #include <xen/spinlock.h> @@ -XXX,XX +XXX,XX @@ void smp_call_function_interrupt(void) irq_exit(); } +#ifdef CONFIG_CPU_HOTPLUG +long cf_check cpu_up_helper(void *data) +{ + unsigned int cpu = (unsigned long)data; + int ret = cpu_up(cpu); + + /* Have one more go on EBUSY. */ + if ( ret == -EBUSY ) + ret = cpu_up(cpu); + + if ( !ret && arch_smt_cpu_disable(cpu) ) + { + ret = cpu_down_helper(data); + if ( ret ) + printk("Could not re-offline CPU%u (%d)\n", cpu, ret); + else + ret = -EPERM; + } + + return ret; +} + +long cf_check cpu_down_helper(void *data) +{ + int cpu = (unsigned long)data; + int ret = cpu_down(cpu); + /* Have one more go on EBUSY. */ + if ( ret == -EBUSY ) + ret = cpu_down(cpu); + return ret; +} +#endif /* CONFIG_CPU_HOTPLUG */ + /* * Local variables: * mode: C diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -XXX,XX +XXX,XX @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) copyback = 1; break; +#ifdef CONFIG_CPU_HOTPLUG + case XEN_SYSCTL_cpu_hotplug: + { + unsigned int cpu = op->u.cpu_hotplug.cpu; + unsigned int hp_op = op->u.cpu_hotplug.op; + bool plug; + long (*fn)(void *data); + void *hcpu; + + switch ( hp_op ) + { + case XEN_SYSCTL_CPU_HOTPLUG_ONLINE: + plug = true; + fn = cpu_up_helper; + hcpu = _p(cpu); + break; + + case XEN_SYSCTL_CPU_HOTPLUG_OFFLINE: + plug = false; + fn = cpu_down_helper; + hcpu = _p(cpu); + break; + + case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE: + case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE: + /* Use arch specific handlers as SMT is very arch-dependent */ + ret = arch_do_sysctl(op, u_sysctl); + copyback = 0; + goto out; + + default: + ret = -EOPNOTSUPP; + break; + } + + if ( !ret ) + ret = plug ? xsm_resource_plug_core(XSM_HOOK) + : xsm_resource_unplug_core(XSM_HOOK); + + if ( !ret ) + ret = continue_hypercall_on_cpu(0, fn, hcpu); + break; + } +#endif + default: ret = arch_do_sysctl(op, u_sysctl); copyback = 0; diff --git a/xen/include/xen/smp.h b/xen/include/xen/smp.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/smp.h +++ b/xen/include/xen/smp.h @@ -XXX,XX +XXX,XX @@ extern void *stack_base[NR_CPUS]; void initialize_cpu_data(unsigned int cpu); int setup_cpu_root_pgt(unsigned int cpu); +bool arch_smt_cpu_disable(unsigned int cpu); +long cf_check cpu_up_helper(void *data); +long cf_check cpu_down_helper(void *data); + #endif /* __XEN_SMP_H__ */ -- 2.51.2
With CPU hotplug sysctls implemented on Arm it becomes useful to have a tool for calling them. According to the commit history it seems that putting hptool under config MIGRATE was a measure to fix IA64 build. As IA64 is no longer supported it can now be brought back. So build it unconditionally. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v4->v5: * make hptool always build v3->v4: * no changes v2->v3: * no changes v1->v2: * switch to configure from legacy config --- tools/libs/guest/Makefile.common | 2 +- tools/misc/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libs/guest/Makefile.common b/tools/libs/guest/Makefile.common index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/guest/Makefile.common +++ b/tools/libs/guest/Makefile.common @@ -XXX,XX +XXX,XX @@ OBJS-y += xg_private.o OBJS-y += xg_domain.o OBJS-y += xg_suspend.o OBJS-y += xg_resume.o +OBJS-y += xg_offline_page.o ifeq ($(CONFIG_MIGRATE),y) OBJS-y += xg_sr_common.o OBJS-$(CONFIG_X86) += xg_sr_common_x86.o @@ -XXX,XX +XXX,XX @@ OBJS-$(CONFIG_X86) += xg_sr_save_x86_pv.o OBJS-$(CONFIG_X86) += xg_sr_save_x86_hvm.o OBJS-y += xg_sr_restore.o OBJS-y += xg_sr_save.o -OBJS-y += xg_offline_page.o else OBJS-y += xg_nomigrate.o endif diff --git a/tools/misc/Makefile b/tools/misc/Makefile index XXXXXXX..XXXXXXX 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -XXX,XX +XXX,XX @@ INSTALL_BIN += xencov_split INSTALL_BIN += $(INSTALL_BIN-y) # Everything to be installed in regular sbin/ -INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool INSTALL_SBIN-$(CONFIG_X86) += xen-hvmcrash INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx INSTALL_SBIN-$(CONFIG_X86) += xen-lowmemd @@ -XXX,XX +XXX,XX @@ INSTALL_SBIN += xenwatchdogd INSTALL_SBIN += xen-access INSTALL_SBIN += xen-livepatch INSTALL_SBIN += xen-diag +INSTALL_SBIN += xen-hptool INSTALL_SBIN += $(INSTALL_SBIN-y) # Everything to be installed -- 2.51.2
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v4->v5: * s/supported/implemented/ * update SUPPORT.md v3->v4: * update configuration section v2->v3: * patch introduced --- SUPPORT.md | 1 + docs/misc/cpu-hotplug.txt | 50 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100644 docs/misc/cpu-hotplug.txt diff --git a/SUPPORT.md b/SUPPORT.md index XXXXXXX..XXXXXXX 100644 --- a/SUPPORT.md +++ b/SUPPORT.md @@ -XXX,XX +XXX,XX @@ For the Cortex A77 r0p0 - r1p0, see Errata 1508412. ### ACPI CPU Hotplug Status, x86: Experimental + Status, Arm64: Experimental ### Physical Memory diff --git a/docs/misc/cpu-hotplug.txt b/docs/misc/cpu-hotplug.txt new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/docs/misc/cpu-hotplug.txt @@ -XXX,XX +XXX,XX @@ +CPU Hotplug +=========== + +CPU hotplug is a feature that allows pCPU cores to be added to or removed from a +running system without requiring a reboot. It is implemented on x86 and Arm64 +architectures. + +Implementation Details +---------------------- + +CPU hotplug is implemented through the `XEN_SYSCTL_CPU_HOTPLUG_*` sysctl calls. +The specific calls are: + +- `XEN_SYSCTL_CPU_HOTPLUG_ONLINE`: Brings a pCPU online +- `XEN_SYSCTL_CPU_HOTPLUG_OFFLINE`: Takes a pCPU offline +- `XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE`: Enables SMT threads (x86 only) +- `XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE`: Disables SMT threads (x86 only) + +All cores can be disabled, assuming hardware support, except for the boot core. +Sysctl calls are routed to the boot core before doing any actual up/down +operations on other cores. + +Configuration +------------- + +The presence of the feature is controlled by CONFIG_CPU_HOTPLUG option. It is +enabled unconditionally on x86 architecture. On Arm64, the option is enabled by +default when ITS, FFA, and TEE configs are disabled. +xen-hptool userspace tool is built unconditionally. + +Usage +----- + +Disable core: + +$ xen-hptool cpu-offline 2 +Prepare to offline CPU 2 +(XEN) Removing cpu 2 from runqueue 0 +CPU 2 offlined successfully + +Enable core: + +$ xen-hptool cpu-online 2 +Prepare to online CPU 2 +(XEN) Bringing up CPU2 +(XEN) GICv3: CPU2: Found redistributor in region 0 @00000a004005c000 +(XEN) CPU2: Guest atomics will try 1 times before pausing the domain +(XEN) CPU 2 booted. +(XEN) Adding cpu 2 to runqueue 0 +CPU 2 onlined successfully -- 2.51.2
This series implements support for CPU hotplug/unplug on Arm. To achieve this, several things need to be done: 1. XEN_SYSCTL_CPU_HOTPLUG_* calls implemented on Arm64. 2. Enabled building of xen-hptool. 3. Migration of irqs from dying CPUs implemented. Tested on QEMU and R-Car Gen5 HW. v7->v8: * see individual patches v6->v7: * new patch "Kconfig: Make cpu hotplug configurable v5->v6: * see individual patches v4->v5: * drop merged patches * combine "smp: Move cpu_up/down helpers to common code" with "arm/sysctl: Implement cpu hotplug ops" * see individual patches v3->v4: * add irq migration patches * see individual patches v2->v3: * add docs v1->v2: * see individual patches Mykyta Poturai (6): arm/irq: Keep track of irq affinities arm/irq: Migrate IRQs during CPU up/down operations Kconfig: Make cpu hotplug configurable arm/sysctl: Implement cpu hotplug ops tools: Allow building xen-hptool without CONFIG_MIGRATE docs: Document CPU hotplug docs/misc/cpu-hotplug.txt | 97 ++++++++++ tools/misc/Makefile | 8 +- tools/misc/xen-hptool-x86.c | 277 ++++++++++++++++++++++++++++ tools/misc/xen-hptool.c | 293 ++---------------------------- tools/misc/xen-hptool.h | 14 ++ xen/arch/arm/gic-vgic.c | 2 + xen/arch/arm/include/asm/irq.h | 6 + xen/arch/arm/irq.c | 69 ++++++- xen/arch/arm/smp.c | 9 + xen/arch/arm/smpboot.c | 7 + xen/arch/arm/vgic.c | 14 +- xen/arch/arm/vgic/vgic-mmio-v2.c | 11 +- xen/arch/arm/vgic/vgic.c | 21 ++- xen/arch/ppc/stubs.c | 4 + xen/arch/riscv/stubs.c | 5 + xen/arch/x86/include/asm/smp.h | 3 - xen/arch/x86/platform_hypercall.c | 12 ++ xen/arch/x86/smp.c | 35 +--- xen/arch/x86/sysctl.c | 25 ++- xen/common/Kconfig | 8 + xen/common/smp.c | 35 ++++ xen/common/sysctl.c | 42 +++++ xen/include/xen/smp.h | 4 + xen/xsm/flask/hooks.c | 2 - 24 files changed, 660 insertions(+), 343 deletions(-) create mode 100644 docs/misc/cpu-hotplug.txt create mode 100644 tools/misc/xen-hptool-x86.c create mode 100644 tools/misc/xen-hptool.h -- 2.51.2
Currently on Arm the desc->affinity mask of an irq is never updated, which makes it hard to know the actual affinity of an interrupt. Fix this by updating the field in irq_set_affinity. Changing desc->affinity requires desc->lock to be held, so add an assertion to ensure that callers of irq_set_affinity are doing so correctly. With desc->lock now being required for irq_set_affinity, add locking around calls to it where it was missing. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> --- v6->v7: * update commit message * fix possible locking on null desc * collect RBs v5->v6: * add missing locking around irq_set_affinity calls v4->v5: * add locking v3->v4: * patch introduced --- xen/arch/arm/gic-vgic.c | 2 ++ xen/arch/arm/irq.c | 9 +++++++-- xen/arch/arm/vgic.c | 14 ++++++++++++-- xen/arch/arm/vgic/vgic-mmio-v2.c | 11 +++++------ xen/arch/arm/vgic/vgic.c | 21 +++++++++++++-------- 5 files changed, 39 insertions(+), 18 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -XXX,XX +XXX,XX @@ static void gic_update_one_lr(struct vcpu *v, int i) if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) { struct vcpu *v_target = vgic_get_target_vcpu(v, irq); + spin_lock(&p->desc->lock); irq_set_affinity(p->desc, cpumask_of(v_target->processor)); + spin_unlock(&p->desc->lock); clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status); } } diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -XXX,XX +XXX,XX @@ static inline struct domain *irq_get_domain(struct irq_desc *desc) return irq_get_guest_info(desc)->d; } +/* Must be called with desc->lock held */ void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask) { - if ( desc != NULL ) - desc->handler->set_affinity(desc, mask); + if ( desc == NULL ) + return; + + ASSERT(spin_is_locked(&desc->lock)); + cpumask_copy(desc->affinity, mask); + desc->handler->set_affinity(desc, mask); } int request_irq(unsigned int irq, unsigned int irqflags, diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -XXX,XX +XXX,XX @@ bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq) if ( list_empty(&p->inflight) ) { + spin_lock(&p->desc->lock); irq_set_affinity(p->desc, cpumask_of(new->processor)); + spin_unlock(&p->desc->lock); spin_unlock_irqrestore(&old->arch.vgic.lock, flags); return true; } @@ -XXX,XX +XXX,XX @@ bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq) if ( !list_empty(&p->lr_queue) ) { vgic_remove_irq_from_queues(old, p); + spin_lock(&p->desc->lock); irq_set_affinity(p->desc, cpumask_of(new->processor)); + spin_unlock(&p->desc->lock); spin_unlock_irqrestore(&old->arch.vgic.lock, flags); vgic_inject_irq(new->domain, new, irq, true); return true; @@ -XXX,XX +XXX,XX @@ void arch_move_irqs(struct vcpu *v) struct domain *d = v->domain; struct pending_irq *p; struct vcpu *v_target; + unsigned long flags; int i; /* @@ -XXX,XX +XXX,XX @@ void arch_move_irqs(struct vcpu *v) p = irq_to_pending(v_target, virq); if ( v_target == v && !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) + { + if ( !p->desc ) + continue; + spin_lock_irqsave(&p->desc->lock, flags); irq_set_affinity(p->desc, cpu_mask); + spin_unlock_irqrestore(&p->desc->lock, flags); + } } } @@ -XXX,XX +XXX,XX @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, unsigned int n) spin_unlock_irqrestore(&v_target->arch.vgic.lock, flags); if ( p->desc != NULL ) { - irq_set_affinity(p->desc, cpumask_of(v_target->processor)); spin_lock_irqsave(&p->desc->lock, flags); + irq_set_affinity(p->desc, cpumask_of(v_target->processor)); /* * The irq cannot be a PPI, we only support delivery of SPIs * to guests. @@ -XXX,XX +XXX,XX @@ void vgic_check_inflight_irqs_pending(struct vcpu *v, unsigned int rank, uint32_ * indent-tabs-mode: nil * End: */ - diff --git a/xen/arch/arm/vgic/vgic-mmio-v2.c b/xen/arch/arm/vgic/vgic-mmio-v2.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/vgic/vgic-mmio-v2.c +++ b/xen/arch/arm/vgic/vgic-mmio-v2.c @@ -XXX,XX +XXX,XX @@ static void vgic_mmio_write_target(struct vcpu *vcpu, for ( i = 0; i < len; i++ ) { struct vgic_irq *irq = vgic_get_irq(vcpu->domain, NULL, intid + i); + struct irq_desc *desc = irq_to_desc(irq->hwintid); - spin_lock_irqsave(&irq->irq_lock, flags); + spin_lock_irqsave(&desc->lock, flags); + spin_lock(&irq->irq_lock); irq->targets = (val >> (i * 8)) & cpu_mask; if ( irq->targets ) { irq->target_vcpu = vcpu->domain->vcpu[ffs(irq->targets) - 1]; if ( irq->hw ) - { - struct irq_desc *desc = irq_to_desc(irq->hwintid); - irq_set_affinity(desc, cpumask_of(irq->target_vcpu->processor)); - } } else irq->target_vcpu = NULL; - spin_unlock_irqrestore(&irq->irq_lock, flags); + spin_unlock(&irq->irq_lock); + spin_unlock_irqrestore(&desc->lock, flags); vgic_put_irq(vcpu->domain, irq); } } diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/vgic/vgic.c +++ b/xen/arch/arm/vgic/vgic.c @@ -XXX,XX +XXX,XX @@ void arch_move_irqs(struct vcpu *v) { struct vgic_irq *irq = vgic_get_irq(d, NULL, i + VGIC_NR_PRIVATE_IRQS); unsigned long flags; + irq_desc_t *desc; if ( !irq ) continue; - spin_lock_irqsave(&irq->irq_lock, flags); - - /* Only hardware mapped vIRQs that are targeting this vCPU. */ - if ( irq->hw && irq->target_vcpu == v) + if ( !irq->hw ) { - irq_desc_t *desc = irq_to_desc(irq->hwintid); + vgic_put_irq(d, irq); + continue; + } + desc = irq_to_desc(irq->hwintid); + spin_lock_irqsave(&desc->lock, flags); + spin_lock(&irq->irq_lock); + + /* Only hardware mapped vIRQs that are targeting this vCPU. */ + if ( irq->target_vcpu == v ) irq_set_affinity(desc, cpumask_of(v->processor)); - } - spin_unlock_irqrestore(&irq->irq_lock, flags); - vgic_put_irq(d, irq); + spin_unlock(&irq->irq_lock); + spin_unlock_irqrestore(&desc->lock, flags); } } -- 2.51.2
Move IRQs from dying CPU to the online ones when a CPU is getting offlined. When onlining, rebalance all IRQs in a round-robin fashion. Guest-bound IRQs are already handled by scheduler in the process of moving vCPUs to active pCPUs, so we only need to handle IRQs used by Xen itself. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v7->v8: * check only existings ESPIs v6->v7: * replace ifdef with IS_ENABLED v5->v6: * don't do any balancing on boot * only do balancing when cpu hotplug is enabled v4->v5: * handle CPU onlining as well * more comments * fix crash when ESPI is disabled * don't assume CPU 0 is a boot CPU * use insigned int for irq number * remove assumption that all irqs a bound to CPU 0 by default from the commit message v3->v4: * patch introduced --- xen/arch/arm/include/asm/irq.h | 6 ++++ xen/arch/arm/irq.c | 60 ++++++++++++++++++++++++++++++++++ xen/arch/arm/smpboot.c | 7 ++++ 3 files changed, 73 insertions(+) diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/include/asm/irq.h +++ b/xen/arch/arm/include/asm/irq.h @@ -XXX,XX +XXX,XX @@ bool irq_type_set_by_domain(const struct domain *d); void irq_end_none(struct irq_desc *irq); #define irq_end_none irq_end_none +#ifdef CONFIG_CPU_HOTPLUG +void rebalance_irqs(unsigned int from, bool up); +#else +static inline void rebalance_irqs(unsigned int from, bool up) {} +#endif + #endif /* _ASM_HW_IRQ_H */ /* * Local variables: diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -XXX,XX +XXX,XX @@ static int init_local_irq_data(unsigned int cpu) return 0; } +#ifdef CONFIG_CPU_HOTPLUG +static int cpu_next; + +static void balance_irq(int irq, unsigned int from, bool up) +{ + struct irq_desc *desc = irq_to_desc(irq); + unsigned long flags; + + ASSERT(!cpumask_empty(&cpu_online_map)); + + spin_lock_irqsave(&desc->lock, flags); + if ( likely(!desc->action) ) + goto out; + + if ( likely(test_bit(_IRQ_GUEST, &desc->status) || + test_bit(_IRQ_MOVE_PENDING, &desc->status)) ) + goto out; + + /* + * Setting affinity to a mask of multiple CPUs causes the GIC drivers to + * select one CPU from that mask. If the dying CPU was included in the IRQ's + * affinity mask, we cannot determine exactly which CPU the interrupt is + * currently routed to, as GIC drivers lack a concrete get_affinity API. So + * to be safe we must reroute it to a new, definitely online, CPU. In the + * case of CPU going down, we move only the interrupt that could reside on + * it. Otherwise, we rearrange all interrupts in a round-robin fashion. + */ + if ( !up && !cpumask_test_cpu(from, desc->affinity) ) + goto out; + + cpu_next = cpumask_cycle(cpu_next, &cpu_online_map); + irq_set_affinity(desc, cpumask_of(cpu_next)); + +out: + spin_unlock_irqrestore(&desc->lock, flags); +} + +void rebalance_irqs(unsigned int from, bool up) +{ + int irq; + + if ( cpumask_empty(&cpu_online_map) ) + return; + + for ( irq = NR_LOCAL_IRQS; irq < NR_IRQS; irq++ ) + balance_irq(irq, from, up); + +#ifdef CONFIG_GICV3_ESPI + for ( irq = ESPI_BASE_INTID; irq < ESPI_BASE_INTID + gic_number_espis(); + irq++ ) + balance_irq(irq, from, up); +#endif +} +#endif /* CONFIG_CPU_HOTPLUG */ + static int cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) { @@ -XXX,XX +XXX,XX @@ static int cpu_callback(struct notifier_block *nfb, unsigned long action, printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n", cpu); break; + case CPU_ONLINE: + if ( IS_ENABLED(CONFIG_CPU_HOTPLUG) && + system_state >= SYS_STATE_active ) + rebalance_irqs(cpu, true); + break; } return notifier_from_errno(rc); diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -XXX,XX +XXX,XX @@ void __cpu_disable(void) smp_mb(); + /* + * Now that the interrupts are cleared and the CPU marked as offline, + * move interrupts out of it + */ + if ( IS_ENABLED(CONFIG_CPU_HOTPLUG) ) + rebalance_irqs(cpu, false); + /* Return to caller; eventually the IPI mechanism will unwind and the * scheduler will drop to the idle loop, which will call stop_cpu(). */ } -- 2.51.2
For the purposes of certification, we want as little code as possible to be unconditionally compiled in. Make CPU hotplug and SMT operations configurable to ease the process. This will also help with introducing CPU hotplug on Arm, where it needs to be configurable. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v7->v8: * fix style * s/CPU_HOTPLUG/CPU_ONLINE_OFFLINE/ v6->v7: * new patch --- xen/arch/x86/platform_hypercall.c | 12 ++++++++++++ xen/arch/x86/smp.c | 3 +++ xen/arch/x86/sysctl.c | 12 ++++++++++++ xen/common/Kconfig | 8 ++++++++ 4 files changed, 35 insertions(+) diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/platform_hypercall.c +++ b/xen/arch/x86/platform_hypercall.c @@ -XXX,XX +XXX,XX @@ ret_t do_platform_op( { int cpu = op->u.cpu_ol.cpuid; + if ( !IS_ENABLED(CONFIG_CPU_ONLINE_OFFLINE) ) + { + ret = -EOPNOTSUPP; + break; + } + ret = xsm_resource_plug_core(XSM_HOOK); if ( ret ) break; @@ -XXX,XX +XXX,XX @@ ret_t do_platform_op( { int cpu = op->u.cpu_ol.cpuid; + if ( !IS_ENABLED(CONFIG_CPU_ONLINE_OFFLINE) ) + { + ret = -EOPNOTSUPP; + break; + } + ret = xsm_resource_unplug_core(XSM_HOOK); if ( ret ) break; diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/smp.c +++ b/xen/arch/x86/smp.c @@ -XXX,XX +XXX,XX @@ void cf_check call_function_interrupt(void) smp_call_function_interrupt(); } +#ifdef CONFIG_CPU_ONLINE_OFFLINE long cf_check cpu_up_helper(void *data) { unsigned int cpu = (unsigned long)data; @@ -XXX,XX +XXX,XX @@ long cf_check cpu_down_helper(void *data) { int cpu = (unsigned long)data; int ret = cpu_down(cpu); + /* Have one more go on EBUSY. */ if ( ret == -EBUSY ) ret = cpu_down(cpu); return ret; } +#endif diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -XXX,XX +XXX,XX @@ static long cf_check smt_up_down_helper(void *data) unsigned int cpu, sibling_mask = boot_cpu_data.x86_num_siblings - 1; int ret = 0; + if ( !IS_ENABLED(CONFIG_CPU_ONLINE_OFFLINE) ) + { + ASSERT_UNREACHABLE(); + return -EOPNOTSUPP; + } + opt_smt = up; for_each_present_cpu ( cpu ) @@ -XXX,XX +XXX,XX @@ long arch_do_sysctl( long (*fn)(void *data); void *hcpu; + if ( !IS_ENABLED(CONFIG_CPU_ONLINE_OFFLINE) ) + { + ret = -EOPNOTSUPP; + break; + } + switch ( op ) { case XEN_SYSCTL_CPU_HOTPLUG_ONLINE: diff --git a/xen/common/Kconfig b/xen/common/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -XXX,XX +XXX,XX @@ config SYSTEM_SUSPEND If unsure, say N. +config CPU_ONLINE_OFFLINE + bool "CPU online/offline support" + depends on X86 + default y + help + Enable support for bringing CPUs online and offline at runtime. On + X86 this is required for disabling SMT. + menu "Supported hypercall interfaces" visible if EXPERT -- 2.51.2
SMT-disable enforcement check is moved into a separate architecture-specific function. For now this operations only support Arm64. For proper Arm32 support, there needs to be a mechanism to free per-cpu page tables, allocated in init_domheap_mappings. Also, hotplug is not supported if ITS enabled, and partially supported FFA, or TEE is enabled, as they use non-static IRQ actions. Remove ifdef guards for x86 in flask, as cpu hotplug is now supported on more architectures. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v7->v8: * simplify dependencies of config CPU_ONLINE_OFFLINE v6->v7: * use IS_ENABLED istead of ifdef in more places * remove unneded variables * more explicit fallthrough in do_sysctl v5->v6: * fix style issues * rename arch_smt_cpu_disable -> arch_cpu_can_stay_online and invert the logic * use IS_ENABLED istead of ifdef * remove explicit list af arch-specific SYSCTL_CPU_HOTPLUG_* options from the common handler * fix flask issue v4->v5: * move handling to common code * rename config to CPU_HOTPUG * merge with "smp: Move cpu_up/down helpers to common code" v3->v4: * don't reimplement cpu_up/down helpers * add Kconfig option * fixup formatting v2->v3: * no changes v1->v2: * remove SMT ops * remove cpu == 0 checks * add XSM hooks * only implement for 64bit Arm Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- xen/arch/arm/smp.c | 9 ++++++++ xen/arch/ppc/stubs.c | 4 ++++ xen/arch/riscv/stubs.c | 5 ++++ xen/arch/x86/include/asm/smp.h | 3 --- xen/arch/x86/smp.c | 34 +++------------------------ xen/arch/x86/sysctl.c | 13 ++++------- xen/common/Kconfig | 6 ++--- xen/common/smp.c | 35 ++++++++++++++++++++++++++++ xen/common/sysctl.c | 42 ++++++++++++++++++++++++++++++++++ xen/include/xen/smp.h | 4 ++++ xen/xsm/flask/hooks.c | 2 -- 11 files changed, 109 insertions(+), 48 deletions(-) diff --git a/xen/arch/arm/smp.c b/xen/arch/arm/smp.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/smp.c +++ b/xen/arch/arm/smp.c @@ -XXX,XX +XXX,XX @@ void smp_send_call_function_mask(const cpumask_t *mask) } } +/* + * We currently don't support SMT on ARM so we don't need any special logic for + * CPU disabling + */ +inline bool arch_cpu_can_stay_online(unsigned int cpu) +{ + return true; +} + /* * Local variables: * mode: C diff --git a/xen/arch/ppc/stubs.c b/xen/arch/ppc/stubs.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/ppc/stubs.c +++ b/xen/arch/ppc/stubs.c @@ -XXX,XX +XXX,XX @@ void smp_send_call_function_mask(const cpumask_t *mask) BUG_ON("unimplemented"); } +bool arch_cpu_can_stay_online(unsigned int cpu) +{ + BUG_ON("unimplemented"); +} /* irq.c */ void irq_ack_none(struct irq_desc *desc) diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/riscv/stubs.c +++ b/xen/arch/riscv/stubs.c @@ -XXX,XX +XXX,XX @@ void smp_send_call_function_mask(const cpumask_t *mask) BUG_ON("unimplemented"); } +bool arch_cpu_can_stay_online(unsigned int cpu) +{ + BUG_ON("unimplemented"); +} + /* irq.c */ void irq_ack_none(struct irq_desc *desc) diff --git a/xen/arch/x86/include/asm/smp.h b/xen/arch/x86/include/asm/smp.h index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/include/asm/smp.h +++ b/xen/arch/x86/include/asm/smp.h @@ -XXX,XX +XXX,XX @@ int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm); void __stop_this_cpu(void); -long cf_check cpu_up_helper(void *data); -long cf_check cpu_down_helper(void *data); - long cf_check core_parking_helper(void *data); bool core_parking_remove(unsigned int cpu); uint32_t get_cur_idle_nums(void); diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/smp.c +++ b/xen/arch/x86/smp.c @@ -XXX,XX +XXX,XX @@ void cf_check call_function_interrupt(void) } #ifdef CONFIG_CPU_ONLINE_OFFLINE -long cf_check cpu_up_helper(void *data) +bool arch_cpu_can_stay_online(unsigned int cpu) { - unsigned int cpu = (unsigned long)data; - int ret = cpu_up(cpu); - - /* Have one more go on EBUSY. */ - if ( ret == -EBUSY ) - ret = cpu_up(cpu); - - if ( !ret && !opt_smt && - cpu_data[cpu].compute_unit_id == INVALID_CUID && - cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) > 1 ) - { - ret = cpu_down_helper(data); - if ( ret ) - printk("Could not re-offline CPU%u (%d)\n", cpu, ret); - else - ret = -EPERM; - } - - return ret; -} - -long cf_check cpu_down_helper(void *data) -{ - int cpu = (unsigned long)data; - int ret = cpu_down(cpu); - - /* Have one more go on EBUSY. */ - if ( ret == -EBUSY ) - ret = cpu_down(cpu); - return ret; + return opt_smt || cpu_data[cpu].compute_unit_id != INVALID_CUID || + cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) <= 1; } #endif diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -XXX,XX +XXX,XX @@ long arch_do_sysctl( case XEN_SYSCTL_cpu_hotplug: { - unsigned int cpu = sysctl->u.cpu_hotplug.cpu; unsigned int op = sysctl->u.cpu_hotplug.op; bool plug; long (*fn)(void *data); @@ -XXX,XX +XXX,XX @@ long arch_do_sysctl( if ( !IS_ENABLED(CONFIG_CPU_ONLINE_OFFLINE) ) { + ASSERT_UNREACHABLE(); ret = -EOPNOTSUPP; break; } @@ -XXX,XX +XXX,XX @@ long arch_do_sysctl( switch ( op ) { case XEN_SYSCTL_CPU_HOTPLUG_ONLINE: - plug = true; - fn = cpu_up_helper; - hcpu = _p(cpu); - break; - case XEN_SYSCTL_CPU_HOTPLUG_OFFLINE: - plug = false; - fn = cpu_down_helper; - hcpu = _p(cpu); + /* Handled by common code */ + ASSERT_UNREACHABLE(); + ret = -EOPNOTSUPP; break; case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE: diff --git a/xen/common/Kconfig b/xen/common/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -XXX,XX +XXX,XX @@ config SYSTEM_SUSPEND If unsure, say N. config CPU_ONLINE_OFFLINE - bool "CPU online/offline support" - depends on X86 - default y + bool "CPU online/offline support" if EXPERT + depends on X86 || (ARM_64 && !HAS_ITS) + default X86 help Enable support for bringing CPUs online and offline at runtime. On X86 this is required for disabling SMT. diff --git a/xen/common/smp.c b/xen/common/smp.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/smp.c +++ b/xen/common/smp.c @@ -XXX,XX +XXX,XX @@ * GNU General Public License for more details. */ +#include <xen/cpu.h> #include <asm/hardirq.h> #include <asm/processor.h> #include <xen/spinlock.h> @@ -XXX,XX +XXX,XX @@ void smp_call_function_interrupt(void) irq_exit(); } +#ifdef CONFIG_CPU_ONLINE_OFFLINE +long cf_check cpu_up_helper(void *data) +{ + unsigned int cpu = (unsigned long)data; + int ret = cpu_up(cpu); + + /* Have one more go on EBUSY. */ + if ( ret == -EBUSY ) + ret = cpu_up(cpu); + + if ( !ret && !arch_cpu_can_stay_online(cpu) ) + { + ret = cpu_down_helper(data); + if ( ret ) + printk("Could not re-offline CPU%u (%d)\n", cpu, ret); + else + ret = -EPERM; + } + + return ret; +} + +long cf_check cpu_down_helper(void *data) +{ + unsigned int cpu = (unsigned long)data; + int ret = cpu_down(cpu); + + /* Have one more go on EBUSY. */ + if ( ret == -EBUSY ) + ret = cpu_down(cpu); + return ret; +} +#endif /* CONFIG_CPU_ONLINE_OFFLINE */ + /* * Local variables: * mode: C diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -XXX,XX +XXX,XX @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) copyback = 1; break; + case XEN_SYSCTL_cpu_hotplug: + { + unsigned int hp_op = op->u.cpu_hotplug.op; + bool plug; + long (*fn)(void *data); + void *hcpu = _p(op->u.cpu_hotplug.cpu); + + ret = -EOPNOTSUPP; + if ( !IS_ENABLED(CONFIG_CPU_ONLINE_OFFLINE) ) + break; + + switch ( hp_op ) + { + case XEN_SYSCTL_CPU_HOTPLUG_ONLINE: + plug = true; + fn = cpu_up_helper; + break; + + case XEN_SYSCTL_CPU_HOTPLUG_OFFLINE: + plug = false; + fn = cpu_down_helper; + break; + + default: + fn = NULL; + break; + } + + if ( fn ) + { + ret = plug ? xsm_resource_plug_core(XSM_HOOK) + : xsm_resource_unplug_core(XSM_HOOK); + + if ( !ret ) + ret = continue_hypercall_on_cpu(0, fn, hcpu); + + break; + } + } + + /* Use the arch handler for cases not handled here */ + fallthrough; default: ret = arch_do_sysctl(op, u_sysctl); copyback = 0; diff --git a/xen/include/xen/smp.h b/xen/include/xen/smp.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/smp.h +++ b/xen/include/xen/smp.h @@ -XXX,XX +XXX,XX @@ extern void *stack_base[NR_CPUS]; void initialize_cpu_data(unsigned int cpu); int setup_cpu_root_pgt(unsigned int cpu); +bool arch_cpu_can_stay_online(unsigned int cpu); +long cf_check cpu_up_helper(void *data); +long cf_check cpu_down_helper(void *data); + #endif /* __XEN_SMP_H__ */ diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index XXXXXXX..XXXXXXX 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -XXX,XX +XXX,XX @@ static int cf_check flask_sysctl(int cmd) case XEN_SYSCTL_getdomaininfolist: case XEN_SYSCTL_page_offline_op: case XEN_SYSCTL_scheduler_op: -#ifdef CONFIG_X86 case XEN_SYSCTL_cpu_hotplug: -#endif return 0; case XEN_SYSCTL_tbuf_op: -- 2.51.2
With CPU hotplug sysctls implemented on Arm it becomes useful to have a tool for calling them. According to the commit history it seems that putting hptool under config MIGRATE was a measure to fix IA64 build. As IA64 is no longer supported it can now be brought back. So build it unconditionally. Operations specific to x86 architecture are moved into a separate file and only built on x86. Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v7->v8: * move x86 specific function into a separate file v6->v7: * no changes v5->v6: * don't change order in Makefile v4->v5: * make hptool always build v3->v4: * no changes v2->v3: * no changes v1->v2: * switch to configure from legacy config --- tools/misc/Makefile | 8 +- tools/misc/xen-hptool-x86.c | 277 ++++++++++++++++++++++++++++++++++ tools/misc/xen-hptool.c | 293 ++---------------------------------- tools/misc/xen-hptool.h | 14 ++ 4 files changed, 311 insertions(+), 281 deletions(-) create mode 100644 tools/misc/xen-hptool-x86.c create mode 100644 tools/misc/xen-hptool.h diff --git a/tools/misc/Makefile b/tools/misc/Makefile index XXXXXXX..XXXXXXX 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -XXX,XX +XXX,XX @@ INSTALL_BIN += xencov_split INSTALL_BIN += $(INSTALL_BIN-y) # Everything to be installed in regular sbin/ -INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool +INSTALL_SBIN += xen-hptool INSTALL_SBIN-$(CONFIG_X86) += xen-hvmcrash INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx INSTALL_SBIN-$(CONFIG_X86) += xen-lowmemd @@ -XXX,XX +XXX,XX @@ xenhypfs: xenhypfs.o xenlockprof: xenlockprof.o $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) -xen-hptool: xen-hptool.o - $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS) +HPTOOL_OBJS-$(CONFIG_MIGRATE) += xen-hptool-x86.o + +xen-hptool: xen-hptool.o $(HPTOOL_OBJS-y) + $(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS) xenhypfs.o: CFLAGS += $(CFLAGS_libxenhypfs) diff --git a/tools/misc/xen-hptool-x86.c b/tools/misc/xen-hptool-x86.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tools/misc/xen-hptool-x86.c @@ -XXX,XX +XXX,XX @@ +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <xenevtchn.h> +#include <xenctrl.h> +#include <xenguest.h> +#include <xenstore.h> +#include "xen-hptool.h" + +int hp_mem_online_func(int argc, char *argv[], xc_interface *xch) +{ + uint32_t status; + int ret; + unsigned long mfn; + + if (argc != 1) + { + show_help(); + return -1; + } + + sscanf(argv[0], "%lx", &mfn); + printf("Prepare to online MEMORY mfn %lx\n", mfn); + + ret = xc_mark_page_online(xch, mfn, mfn, &status); + + if (ret < 0) + fprintf(stderr, "Onlining page mfn %lx failed, error %x\n", mfn, errno); + else if (status & (PG_ONLINE_FAILED |PG_ONLINE_BROKEN)) { + fprintf(stderr, "Onlining page mfn %lx is broken, " + "Memory online failed\n", mfn); + ret = -1; + } + else if (status & PG_ONLINE_ONLINED) + printf("Memory mfn %lx onlined successfully\n", mfn); + else + printf("Memory is already onlined!\n"); + + return ret; +} + +int hp_mem_query_func(int argc, char *argv[], xc_interface *xch) +{ + uint32_t status; + int ret; + unsigned long mfn; + + if (argc != 1) + { + show_help(); + return -1; + } + + sscanf(argv[0], "%lx", &mfn); + printf("Querying MEMORY mfn %lx status\n", mfn); + ret = xc_query_page_offline_status(xch, mfn, mfn, &status); + + if (ret < 0) + fprintf(stderr, "Querying page mfn %lx failed, error %x\n", mfn, errno); + else + { + printf("Memory Status %x: [", status); + if ( status & PG_OFFLINE_STATUS_OFFLINE_PENDING) + printf(" PAGE_OFFLINE_PENDING "); + if ( status & PG_OFFLINE_STATUS_BROKEN ) + printf(" PAGE_BROKEND "); + if ( status & PG_OFFLINE_STATUS_OFFLINED ) + printf(" PAGE_OFFLINED "); + else + printf(" PAGE_ONLINED "); + printf("]\n"); + } + + return ret; +} + +static int suspend_guest(xc_interface *xch, xenevtchn_handle *xce, int domid, + int *evtchn, int *lockfd) +{ + int port, rc, suspend_evtchn = -1; + + *lockfd = -1; + + if (!evtchn) + return -1; + + port = xs_suspend_evtchn_port(domid); + if (port < 0) + { + fprintf(stderr, "DOM%d: No suspend port, try live migration\n", domid); + goto failed; + } + suspend_evtchn = xc_suspend_evtchn_init_exclusive(xch, xce, domid, + port, lockfd); + if (suspend_evtchn < 0) + { + fprintf(stderr, "Suspend evtchn initialization failed\n"); + goto failed; + } + *evtchn = suspend_evtchn; + + rc = xenevtchn_notify(xce, suspend_evtchn); + if (rc < 0) + { + fprintf(stderr, "Failed to notify suspend channel: errno %d\n", rc); + goto failed; + } + if (xc_await_suspend(xch, xce, suspend_evtchn) < 0) + { + fprintf(stderr, "Suspend Failed\n"); + goto failed; + } + return 0; + +failed: + if (suspend_evtchn != -1) + xc_suspend_evtchn_release(xch, xce, domid, + suspend_evtchn, lockfd); + + return -1; +} + +int hp_mem_offline_func(int argc, char *argv[], xc_interface *xch) +{ + uint32_t status, domid; + int ret; + unsigned long mfn; + + if (argc != 1) + { + show_help(); + return -1; + } + + sscanf(argv[0], "%lx", &mfn); + printf("Prepare to offline MEMORY mfn %lx\n", mfn); + ret = xc_mark_page_offline(xch, mfn, mfn, &status); + if (ret < 0) { + fprintf(stderr, "Offlining page mfn %lx failed, error %x\n", mfn, errno); + if (status & (PG_OFFLINE_XENPAGE | PG_OFFLINE_FAILED)) + fprintf(stderr, "XEN_PAGE is not permitted be offlined\n"); + else if (status & (PG_OFFLINE_FAILED | PG_OFFLINE_NOT_CONV_RAM)) + fprintf(stderr, "RESERVED RAM is not permitted to be offlined\n"); + } + else + { + switch(status & PG_OFFLINE_STATUS_MASK) + { + case PG_OFFLINE_OFFLINED: + { + printf("Memory mfn %lx offlined successfully, current state is" + " [PG_OFFLINE_OFFLINED]\n", mfn); + if (status & PG_OFFLINE_BROKEN) + printf("And this offlined PAGE is already marked broken" + " before!\n"); + break; + } + case PG_OFFLINE_FAILED: + { + fprintf(stderr, "Memory mfn %lx offline failed\n", mfn); + if ( status & PG_OFFLINE_ANONYMOUS) + fprintf(stderr, "the memory is an anonymous page!\n"); + ret = -1; + break; + } + case PG_OFFLINE_PENDING: + { + if (status & PG_OFFLINE_XENPAGE) { + ret = -1; + fprintf(stderr, "Memory mfn %lx offlined succssefully," + "this page is xen page, current state is" + " [PG_OFFLINE_PENDING, PG_OFFLINE_XENPAGE]\n", mfn); + } + else if (status & PG_OFFLINE_OWNED) + { + int result, suspend_evtchn = -1, suspend_lockfd = -1; + xenevtchn_handle *xce; + xce = xenevtchn_open(NULL, 0); + + if (xce == NULL) + { + fprintf(stderr, "When exchange page, fail" + " to open evtchn\n"); + return -1; + } + + domid = status >> PG_OFFLINE_OWNER_SHIFT; + if (suspend_guest(xch, xce, domid, + &suspend_evtchn, &suspend_lockfd)) + { + fprintf(stderr, "Failed to suspend guest %d for" + " mfn %lx\n", domid, mfn); + xenevtchn_close(xce); + return -1; + } + + result = xc_exchange_page(xch, domid, mfn); + + /* Exchange page successfully */ + if (result == 0) + printf("Memory mfn %lx offlined successfully, this " + "page is DOM%d page and being swapped " + "successfully, current state is " + "[PG_OFFLINE_OFFLINED, PG_OFFLINE_OWNED]\n", + mfn, domid); + else { + ret = -1; + fprintf(stderr, "Memory mfn %lx offlined successfully" + " , this page is DOM%d page yet failed to be " + "exchanged. current state is " + "[PG_OFFLINE_PENDING, PG_OFFLINE_OWNED]\n", + mfn, domid); + } + xc_domain_resume(xch, domid, 1); + xc_suspend_evtchn_release(xch, xce, domid, + suspend_evtchn, &suspend_lockfd); + xenevtchn_close(xce); + } + break; + } + }//end of switch + }//end of if + + return ret; +} + +int main_smt_enable(int argc, char *argv[], xc_interface *xch) +{ + int ret; + + if ( argc ) + { + show_help(); + return -1; + } + + for ( ;; ) + { + ret = xc_smt_enable(xch); + if ( (ret >= 0) || (errno != EBUSY) ) + break; + } + + if ( ret < 0 ) + fprintf(stderr, "Unable to enable SMT: errno %d, %s\n", + errno, strerror(errno)); + else + printf("Enabled SMT\n"); + + return ret; +} + +int main_smt_disable(int argc, char *argv[], xc_interface *xch) +{ + int ret; + + if ( argc ) + { + show_help(); + return -1; + } + + for ( ;; ) + { + ret = xc_smt_disable(xch); + if ( (ret >= 0) || (errno != EBUSY) ) + break; + } + + if ( ret < 0 ) + fprintf(stderr, "Unable to disable SMT: errno %d, %s\n", + errno, strerror(errno)); + else + printf("Disabled SMT\n"); + + return ret; +} diff --git a/tools/misc/xen-hptool.c b/tools/misc/xen-hptool.c index XXXXXXX..XXXXXXX 100644 --- a/tools/misc/xen-hptool.c +++ b/tools/misc/xen-hptool.c @@ -XXX,XX +XXX,XX @@ #include <xenguest.h> #include <xenstore.h> #include <xen-tools/common-macros.h> +#include "xen-hptool.h" -static xc_interface *xch; void show_help(void) { @@ -XXX,XX +XXX,XX @@ void show_help(void) " help display this help\n" " cpu-online <cpuid> online CPU <cpuid>\n" " cpu-offline <cpuid> offline CPU <cpuid>\n" +#if defined(__i386__) || defined(__x86_64__) " mem-online <mfn> online MEMORY <mfn>\n" " mem-offline <mfn> offline MEMORY <mfn>\n" " mem-status <mfn> query Memory status<mfn>\n" " smt-enable onlines all SMT threads\n" " smt-disable offlines all SMT threads\n" +#endif ); } /* wrapper function */ -static int help_func(int argc, char *argv[]) +static int help_func(int argc, char *argv[], xc_interface *xch) { show_help(); return 0; } -static int hp_mem_online_func(int argc, char *argv[]) -{ - uint32_t status; - int ret; - unsigned long mfn; - - if (argc != 1) - { - show_help(); - return -1; - } - - sscanf(argv[0], "%lx", &mfn); - printf("Prepare to online MEMORY mfn %lx\n", mfn); - - ret = xc_mark_page_online(xch, mfn, mfn, &status); - - if (ret < 0) - fprintf(stderr, "Onlining page mfn %lx failed, error %x\n", mfn, errno); - else if (status & (PG_ONLINE_FAILED |PG_ONLINE_BROKEN)) { - fprintf(stderr, "Onlining page mfn %lx is broken, " - "Memory online failed\n", mfn); - ret = -1; - } - else if (status & PG_ONLINE_ONLINED) - printf("Memory mfn %lx onlined successfully\n", mfn); - else - printf("Memory is already onlined!\n"); - - return ret; -} - -static int hp_mem_query_func(int argc, char *argv[]) -{ - uint32_t status; - int ret; - unsigned long mfn; - - if (argc != 1) - { - show_help(); - return -1; - } - - sscanf(argv[0], "%lx", &mfn); - printf("Querying MEMORY mfn %lx status\n", mfn); - ret = xc_query_page_offline_status(xch, mfn, mfn, &status); - - if (ret < 0) - fprintf(stderr, "Querying page mfn %lx failed, error %x\n", mfn, errno); - else - { - printf("Memory Status %x: [", status); - if ( status & PG_OFFLINE_STATUS_OFFLINE_PENDING) - printf(" PAGE_OFFLINE_PENDING "); - if ( status & PG_OFFLINE_STATUS_BROKEN ) - printf(" PAGE_BROKEND "); - if ( status & PG_OFFLINE_STATUS_OFFLINED ) - printf(" PAGE_OFFLINED "); - else - printf(" PAGE_ONLINED "); - printf("]\n"); - } - - return ret; -} - -static int suspend_guest(xc_interface *xch, xenevtchn_handle *xce, int domid, - int *evtchn, int *lockfd) -{ - int port, rc, suspend_evtchn = -1; - - *lockfd = -1; - - if (!evtchn) - return -1; - - port = xs_suspend_evtchn_port(domid); - if (port < 0) - { - fprintf(stderr, "DOM%d: No suspend port, try live migration\n", domid); - goto failed; - } - suspend_evtchn = xc_suspend_evtchn_init_exclusive(xch, xce, domid, - port, lockfd); - if (suspend_evtchn < 0) - { - fprintf(stderr, "Suspend evtchn initialization failed\n"); - goto failed; - } - *evtchn = suspend_evtchn; - - rc = xenevtchn_notify(xce, suspend_evtchn); - if (rc < 0) - { - fprintf(stderr, "Failed to notify suspend channel: errno %d\n", rc); - goto failed; - } - if (xc_await_suspend(xch, xce, suspend_evtchn) < 0) - { - fprintf(stderr, "Suspend Failed\n"); - goto failed; - } - return 0; - -failed: - if (suspend_evtchn != -1) - xc_suspend_evtchn_release(xch, xce, domid, - suspend_evtchn, lockfd); - - return -1; -} - -static int hp_mem_offline_func(int argc, char *argv[]) -{ - uint32_t status, domid; - int ret; - unsigned long mfn; - - if (argc != 1) - { - show_help(); - return -1; - } - - sscanf(argv[0], "%lx", &mfn); - printf("Prepare to offline MEMORY mfn %lx\n", mfn); - ret = xc_mark_page_offline(xch, mfn, mfn, &status); - if (ret < 0) { - fprintf(stderr, "Offlining page mfn %lx failed, error %x\n", mfn, errno); - if (status & (PG_OFFLINE_XENPAGE | PG_OFFLINE_FAILED)) - fprintf(stderr, "XEN_PAGE is not permitted be offlined\n"); - else if (status & (PG_OFFLINE_FAILED | PG_OFFLINE_NOT_CONV_RAM)) - fprintf(stderr, "RESERVED RAM is not permitted to be offlined\n"); - } - else - { - switch(status & PG_OFFLINE_STATUS_MASK) - { - case PG_OFFLINE_OFFLINED: - { - printf("Memory mfn %lx offlined successfully, current state is" - " [PG_OFFLINE_OFFLINED]\n", mfn); - if (status & PG_OFFLINE_BROKEN) - printf("And this offlined PAGE is already marked broken" - " before!\n"); - break; - } - case PG_OFFLINE_FAILED: - { - fprintf(stderr, "Memory mfn %lx offline failed\n", mfn); - if ( status & PG_OFFLINE_ANONYMOUS) - fprintf(stderr, "the memory is an anonymous page!\n"); - ret = -1; - break; - } - case PG_OFFLINE_PENDING: - { - if (status & PG_OFFLINE_XENPAGE) { - ret = -1; - fprintf(stderr, "Memory mfn %lx offlined succssefully," - "this page is xen page, current state is" - " [PG_OFFLINE_PENDING, PG_OFFLINE_XENPAGE]\n", mfn); - } - else if (status & PG_OFFLINE_OWNED) - { - int result, suspend_evtchn = -1, suspend_lockfd = -1; - xenevtchn_handle *xce; - xce = xenevtchn_open(NULL, 0); - - if (xce == NULL) - { - fprintf(stderr, "When exchange page, fail" - " to open evtchn\n"); - return -1; - } - - domid = status >> PG_OFFLINE_OWNER_SHIFT; - if (suspend_guest(xch, xce, domid, - &suspend_evtchn, &suspend_lockfd)) - { - fprintf(stderr, "Failed to suspend guest %d for" - " mfn %lx\n", domid, mfn); - xenevtchn_close(xce); - return -1; - } - - result = xc_exchange_page(xch, domid, mfn); - - /* Exchange page successfully */ - if (result == 0) - printf("Memory mfn %lx offlined successfully, this " - "page is DOM%d page and being swapped " - "successfully, current state is " - "[PG_OFFLINE_OFFLINED, PG_OFFLINE_OWNED]\n", - mfn, domid); - else { - ret = -1; - fprintf(stderr, "Memory mfn %lx offlined successfully" - " , this page is DOM%d page yet failed to be " - "exchanged. current state is " - "[PG_OFFLINE_PENDING, PG_OFFLINE_OWNED]\n", - mfn, domid); - } - xc_domain_resume(xch, domid, 1); - xc_suspend_evtchn_release(xch, xce, domid, - suspend_evtchn, &suspend_lockfd); - xenevtchn_close(xce); - } - break; - } - }//end of switch - }//end of if - - return ret; -} - -static int exec_cpu_hp_fn(int (*hp_fn)(xc_interface *, int), int cpu) +static int exec_cpu_hp_fn(int (*hp_fn)(xc_interface *, int), int cpu, + xc_interface *xch) { int ret; @@ -XXX,XX +XXX,XX @@ static int exec_cpu_hp_fn(int (*hp_fn)(xc_interface *, int), int cpu) return ret; } -static int hp_cpu_online_func(int argc, char *argv[]) +static int hp_cpu_online_func(int argc, char *argv[], xc_interface *xch) { int cpu, ret; @@ -XXX,XX +XXX,XX @@ static int hp_cpu_online_func(int argc, char *argv[]) cpu = atoi(argv[0]); printf("Prepare to online CPU %d\n", cpu); - ret = exec_cpu_hp_fn(xc_cpu_online, cpu); + ret = exec_cpu_hp_fn(xc_cpu_online, cpu, xch); if (ret < 0) fprintf(stderr, "CPU %d online failed (error %d: %s)\n", cpu, errno, strerror(errno)); @@ -XXX,XX +XXX,XX @@ static int hp_cpu_online_func(int argc, char *argv[]) return ret; } -static int hp_cpu_offline_func(int argc, char *argv[]) +static int hp_cpu_offline_func(int argc, char *argv[], xc_interface *xch) { int cpu, ret; @@ -XXX,XX +XXX,XX @@ static int hp_cpu_offline_func(int argc, char *argv[]) } cpu = atoi(argv[0]); printf("Prepare to offline CPU %d\n", cpu); - ret = exec_cpu_hp_fn(xc_cpu_offline, cpu); + ret = exec_cpu_hp_fn(xc_cpu_offline, cpu, xch); if (ret < 0) fprintf(stderr, "CPU %d offline failed (error %d: %s)\n", cpu, errno, strerror(errno)); @@ -XXX,XX +XXX,XX @@ static int hp_cpu_offline_func(int argc, char *argv[]) return ret; } -static int main_smt_enable(int argc, char *argv[]) -{ - int ret; - - if ( argc ) - { - show_help(); - return -1; - } - - for ( ;; ) - { - ret = xc_smt_enable(xch); - if ( (ret >= 0) || (errno != EBUSY) ) - break; - } - - if ( ret < 0 ) - fprintf(stderr, "Unable to enable SMT: errno %d, %s\n", - errno, strerror(errno)); - else - printf("Enabled SMT\n"); - - return ret; -} - -static int main_smt_disable(int argc, char *argv[]) -{ - int ret; - - if ( argc ) - { - show_help(); - return -1; - } - - for ( ;; ) - { - ret = xc_smt_disable(xch); - if ( (ret >= 0) || (errno != EBUSY) ) - break; - } - - if ( ret < 0 ) - fprintf(stderr, "Unable to disable SMT: errno %d, %s\n", - errno, strerror(errno)); - else - printf("Disabled SMT\n"); - - return ret; -} - struct { const char *name; - int (*function)(int argc, char *argv[]); + int (*function)(int argc, char *argv[], xc_interface *xch); } main_options[] = { { "help", help_func }, { "cpu-online", hp_cpu_online_func }, { "cpu-offline", hp_cpu_offline_func }, +#if defined(__i386__) || defined(__x86_64__) { "mem-status", hp_mem_query_func}, { "mem-online", hp_mem_online_func}, { "mem-offline", hp_mem_offline_func}, { "smt-enable", main_smt_enable }, { "smt-disable", main_smt_disable }, +#endif }; int main(int argc, char *argv[]) { int i, ret; + xc_interface *xch; if (argc < 2) { @@ -XXX,XX +XXX,XX @@ int main(int argc, char *argv[]) return 1; } - ret = main_options[i].function(argc -2, argv + 2); + ret = main_options[i].function(argc -2, argv + 2, xch); xc_interface_close(xch); diff --git a/tools/misc/xen-hptool.h b/tools/misc/xen-hptool.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tools/misc/xen-hptool.h @@ -XXX,XX +XXX,XX @@ +#ifndef __XEN_HPTOOL_H__ +#define __XEN_HPTOOL_H__ + +#if defined(__i386__) || defined(__x86_64__) +int hp_mem_online_func(int argc, char *argv[], xc_interface *xch); +int hp_mem_query_func(int argc, char *argv[], xc_interface *xch); +int hp_mem_offline_func(int argc, char *argv[], xc_interface *xch); +int main_smt_enable(int argc, char *argv[], xc_interface *xch); +int main_smt_disable(int argc, char *argv[], xc_interface *xch); +#endif + +void show_help(void); + +#endif /* __XEN_HPTOOL_H__ */ -- 2.51.2
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> --- v7->v8: * remove support status update * update config option name v6->v7: * add testing and limitations v5->v6: * no changes v4->v5: * s/supported/implemented/ * update SUPPORT.md v3->v4: * update configuration section v2->v3: * patch introduced --- docs/misc/cpu-hotplug.txt | 97 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 docs/misc/cpu-hotplug.txt diff --git a/docs/misc/cpu-hotplug.txt b/docs/misc/cpu-hotplug.txt new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/docs/misc/cpu-hotplug.txt @@ -XXX,XX +XXX,XX @@ +CPU Hotplug +=========== + +CPU hotplug is a feature that allows pCPU cores to be added to or removed from a +running system without requiring a reboot. It is implemented on x86 and Arm64 +architectures. + +Implementation Details +---------------------- + +CPU hotplug is implemented through the `XEN_SYSCTL_CPU_HOTPLUG_*` sysctl calls. +The specific calls are: + +- `XEN_SYSCTL_CPU_HOTPLUG_ONLINE`: Brings a pCPU online +- `XEN_SYSCTL_CPU_HOTPLUG_OFFLINE`: Takes a pCPU offline +- `XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE`: Enables SMT threads (x86 only) +- `XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE`: Disables SMT threads (x86 only) + +All cores can be disabled, assuming hardware support, except for the boot core. +Sysctl calls are routed to the boot core before doing any actual up/down +operations on other cores. + +If there are Xen-bound interrupts pinned to the pCPU being offlined, they will +be automatically migrated to other online pCPUs. Interrupts used by guest +domains are handled by the scheduler when it reschedules the vCPUs to a new, +online, pCPU. When a pCPU is being onlined, some Xen-bound interrupts will get +redistributed to the newly onlined pCPU to prevent imbalance. + +If pCPU being offlined has some vCPUs pinned to it, they will be automatically +unpinned and migrated to other online pCPUs. + +Limitations +----------- + +On Arm64 cpu hotplug is currently not compatible with ITS, due to an issues with +the redistributor assignment. + +On Arm64 there can be problems with FFA if secure FW support notification ABI. + +Configuration +------------- + +The presence of the feature is controlled by CONFIG_CPU_ONLINE_OFFLINE option. +It is enabled by default on x86 architecture. On Arm64, the option is disabled +by default and marked as EXPERT. xen-hptool userspace tool is built +unconditionally. + +Usage +----- + +Disable core: + +$ xen-hptool cpu-offline 2 +Prepare to offline CPU 2 +(XEN) Removing cpu 2 from runqueue 0 +CPU 2 offlined successfully + +Enable core: + +$ xen-hptool cpu-online 2 +Prepare to online CPU 2 +(XEN) Bringing up CPU2 +(XEN) GICv3: CPU2: Found redistributor in region 0 @00000a004005c000 +(XEN) CPU2: Guest atomics will try 1 times before pausing the domain +(XEN) CPU 2 booted. +(XEN) Adding cpu 2 to runqueue 0 +CPU 2 onlined successfully + +Disabling a core with pinned vCPUs: + +$ xl vcpu-pin 0 3 3 3 +$ xl vcpu-pin 0 2 3 3 +$ xl vcpu-pin 0 1 3 3 +$ xl vcpu-pin 0 0 3 3 +$ xen-hptool cpu-offline 3 +Prepare to offline CPU 3 +(XEN) Breaking affinity for d0v0 +(XEN) Breaking affinity for d0v1 +(XEN) Breaking affinity for d0v2 +(XEN) Breaking affinity for d0v3 +(XEN) Removing cpu 3 from runqueue 0 +CPU 3 offlined successfully + +Testing +------- + +The CPU hotplug feature has been tested on both x86 and Arm64 QEMU setups and on +R-Car Gen5 (Arm64) hardware. + +The tests included: +- Offlining and onlining cores with no pinned vCPUs +- Offlining cores with pinned vCPUs +- Offlining cores with Xen-bound interrupts +- Offlining all cores except the boot core +- Offlining the boot core (expected to fail) +- Enabling and disabling SMT threads (x86 only) +- Ofllining cores to which guests with passthrough devices are pinned -- 2.51.2