[PATCH RFC V6 05/24] arm/virt, kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init

salil.mehta@opnsrc.net posted 24 patches 1 month, 2 weeks ago
[PATCH RFC V6 05/24] arm/virt, kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by salil.mehta@opnsrc.net 1 month, 2 weeks ago
From: Salil Mehta <salil.mehta@huawei.com>

ARM CPU architecture does not allow CPUs to be plugged after system has
initialized. This is a constraint. Hence, the Kernel must know all the CPUs
being booted during its initialization. This applies to the Guest Kernel as
well and therefore, the number of KVM vCPU descriptors in the host must be
fixed at VM initialization time.

Also, the GIC must know all the CPUs it is connected to during its
initialization, and this cannot change afterward. This must also be ensured
during the initialization of the VGIC in KVM. This is necessary because:

1. The association between GICR and MPIDR must be fixed at VM initialization
   time. This is represented by the register
   `GICR_TYPER(mp_affinity, proc_num)`.
2. Memory regions associated with GICR, etc., cannot be changed (added,
   deleted, or modified) after the VM has been initialized. This is not an
   ARM architectural constraint but rather invites a difficult and messy
   change in VGIC data structures.

To enable a hot-add–like model while preserving these constraints, the virt
machine may enumerate more CPUs than are enabled at boot using
`-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
administratively disabled at init). The topology remains fixed at VM
creation time; only the online/offline status may change later.

Administratively disabled vCPUs are not realized in QOM until first enabled,
avoiding creation of unnecessary vCPU threads at boot. On large systems, this
reduces startup time proportionally to the number of disabled vCPUs. Once a
QOM vCPU is realized and its thread created, subsequent enable/disable actions
do not unrealize it. This behaviour was adopted following review feedback and
differs from earlier RFC versions.

Co-developed-by: Keqian Zhu <zhuqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhuqian1@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 accel/kvm/kvm-all.c    |  2 +-
 hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
 hw/core/qdev.c         | 17 ++++++++++
 include/hw/qdev-core.h | 19 +++++++++++
 include/system/kvm.h   |  8 +++++
 target/arm/cpu.c       |  2 ++
 target/arm/kvm.c       | 40 +++++++++++++++++++++-
 target/arm/kvm_arm.h   | 11 ++++++
 8 files changed, 168 insertions(+), 8 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 890d5ea9f8..0e7d9d5c3d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -460,7 +460,7 @@ static void kvm_reset_parked_vcpus(KVMState *s)
  *
  * @returns: 0 when success, errno (<0) when failed.
  */
-static int kvm_create_vcpu(CPUState *cpu)
+int kvm_create_vcpu(CPUState *cpu)
 {
     unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
     KVMState *s = kvm_state;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4ded19dc69..f4eeeacf6c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2152,6 +2152,49 @@ static void virt_post_cpus_gic_realized(VirtMachineState *vms,
     }
 }
 
+static void
+virt_setup_lazy_vcpu_realization(Object *cpuobj, VirtMachineState *vms)
+{
+    /*
+     * Present & administratively disabled vCPUs:
+     *
+     * These CPUs are marked offline at init via '-smp disabledcpus=N'. We
+     * intentionally do not realize them during the first boot, since it is
+     * not known if or when they will ever be enabled. The decision to enable
+     * such CPUs depends on policy (e.g. guided by SLAs or other deployment
+     * requirements).
+     *
+     * Realizing all disabled vCPUs up front would make boot time proportional
+     * to 'maxcpus', even if policy permits only a small subset to be enabled.
+     * This can lead to unacceptable boot delays in some scenarios.
+     *
+     * Instead, these CPUs remain administratively disabled and unrealized at
+     * boot, to be instantiated and brought online only if policy later allows
+     * it.
+     */
+
+    /* set this vCPU to be administratively 'disabled' in QOM */
+    qdev_disable(DEVICE(cpuobj), NULL, &error_fatal);
+
+    if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED) {
+        object_property_set_int(cpuobj, "psci-conduit", vms->psci_conduit,
+                                NULL);
+    }
+
+    /*
+     * [!] Constraint: The ARM CPU architecture does not permit new CPUs
+     * to be added after system initialization.
+     *
+     * Workaround: Pre-create KVM vCPUs even for those that are not yet
+     * online i.e. powered-off, keeping them `parked` and in an
+     * `unrealized (at-least during boot time)` state within QEMU until
+     * they are powered-on and made online.
+     */
+    if (kvm_enabled()) {
+        kvm_arm_create_host_vcpu(ARM_CPU(cpuobj));
+    }
+}
+
 static void machvirt_init(MachineState *machine)
 {
     VirtMachineState *vms = VIRT_MACHINE(machine);
@@ -2319,10 +2362,6 @@ static void machvirt_init(MachineState *machine)
         Object *cpuobj;
         CPUState *cs;
 
-        if (n >= smp_cpus) {
-            break;
-        }
-
         cpuobj = object_new(possible_cpus->cpus[n].type);
         object_property_set_int(cpuobj, "mp-affinity",
                                 possible_cpus->cpus[n].arch_id, NULL);
@@ -2427,8 +2466,34 @@ static void machvirt_init(MachineState *machine)
             }
         }
 
-        qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
-        object_unref(cpuobj);
+        /* start secondary vCPUs in a powered-down state */
+        if(n && mc->has_online_capable_cpus) {
+            object_property_set_bool(cpuobj, "start-powered-off", true, NULL);
+        }
+
+        if (n < smp_cpus) {
+            /* 'Present' & 'Enabled' vCPUs */
+            qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
+            object_unref(cpuobj);
+        } else {
+            /* 'Present' & 'Disabled' vCPUs */
+            virt_setup_lazy_vcpu_realization(cpuobj, vms);
+        }
+
+        /*
+         * All possible vCPUs should have QOM vCPU Object pointer & arch-id.
+         * 'cpus_queue' (accessed via qemu_get_cpu()) contains only realized and
+         * enabled vCPUs. Hence, we must now populate the 'possible_cpus' list.
+         */
+        if (kvm_enabled()) {
+            /*
+             * Override the default architecture ID with the one retrieved
+             * from KVM, as they currently differ.
+             */
+            machine->possible_cpus->cpus[n].arch_id =
+                arm_cpu_mp_affinity(ARM_CPU(cs));
+        }
+        machine->possible_cpus->cpus[n].cpu = cs;
     }
 
     /* Now we've created the CPUs we can see if they have the hypvirt timer */
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 8502d6216f..5816abae39 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -309,6 +309,23 @@ void qdev_assert_realized_properly(void)
                                    qdev_assert_realized_properly_cb, NULL);
 }
 
+bool qdev_disable(DeviceState *dev, BusState *bus, Error **errp)
+{
+    g_assert(dev);
+
+    if (bus) {
+        error_setg(errp, "Device %s 'disable' operation not supported",
+                   object_get_typename(OBJECT(dev)));
+        return false;
+    }
+
+    /* devices like cpu don't have bus */
+    g_assert(!DEVICE_GET_CLASS(dev)->bus_type);
+
+    return object_property_set_str(OBJECT(dev), "admin_power_state", "disabled",
+                                   errp);
+}
+
 bool qdev_machine_modified(void)
 {
     return qdev_hot_added || qdev_hot_removed;
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 3bc212ab3a..2c22b32a3f 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -570,6 +570,25 @@ bool qdev_realize(DeviceState *dev, BusState *bus, Error **errp);
  */
 bool qdev_realize_and_unref(DeviceState *dev, BusState *bus, Error **errp);
 
+/**
+ * qdev_disable - Initiate administrative disablement and power-off of device
+ * @dev:   The device to be administratively powered off
+ * @bus:   The bus on which the device resides (may be NULL for CPUs)
+ * @errp:  Pointer to a location where an error can be reported
+ *
+ * This function initiates an administrative transition of the device into a
+ * DISABLED state. This may trigger a graceful shutdown process depending on
+ * platform capabilities. For ACPI platforms, this typically involves notifying
+ * the guest via events such as Notify(..., 0x03) and executing _EJx.
+ *
+ * Once completed, the device's operational power is turned off and it is
+ * marked as administratively DISABLED. Further guest usage is blocked until
+ * re-enabled by host-side policy.
+ *
+ * Returns true on success; false if an error occurs, with @errp populated.
+ */
+bool qdev_disable(DeviceState *dev, BusState *bus, Error **errp);
+
 /**
  * qdev_unrealize: Unrealize a device
  * @dev: device to unrealize
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 3c7d314736..4896a3c9c5 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -317,6 +317,14 @@ int kvm_create_device(KVMState *s, uint64_t type, bool test);
  */
 bool kvm_device_supported(int vmfd, uint64_t type);
 
+/**
+ * kvm_create_vcpu - Gets a parked KVM vCPU or creates a KVM vCPU
+ * @cpu: QOM CPUState object for which KVM vCPU has to be fetched/created.
+ *
+ * @returns: 0 when success, errno (<0) when failed.
+ */
+int kvm_create_vcpu(CPUState *cpu);
+
 /**
  * kvm_park_vcpu - Park QEMU KVM vCPU context
  * @cpu: QOM CPUState object for which QEMU KVM vCPU context has to be parked.
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 7e0d5b2ed8..a5906d1672 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1500,6 +1500,8 @@ static void arm_cpu_initfn(Object *obj)
         /* TCG and HVF implement PSCI 1.1 */
         cpu->psci_version = QEMU_PSCI_VERSION_1_1;
     }
+
+    CPU(obj)->thread_id = 0;
 }
 
 /*
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 6672344855..1962eb29b2 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -991,6 +991,38 @@ void kvm_arm_reset_vcpu(ARMCPU *cpu)
     write_list_to_cpustate(cpu);
 }
 
+void kvm_arm_create_host_vcpu(ARMCPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+    unsigned long vcpu_id = cs->cpu_index;
+    int ret;
+
+    ret = kvm_create_vcpu(cs);
+    if (ret < 0) {
+        error_report("Failed to create host vcpu %ld", vcpu_id);
+        abort();
+    }
+
+    /*
+     * Initialize the vCPU in the host. This will reset the sys regs
+     * for this vCPU and related registers like MPIDR_EL1 etc. also
+     * get programmed during this call to host. These are referenced
+     * later while setting device attributes of the GICR during GICv3
+     * reset.
+     */
+    ret = kvm_arch_init_vcpu(cs);
+    if (ret < 0) {
+        error_report("Failed to initialize host vcpu %ld", vcpu_id);
+        abort();
+    }
+
+    /*
+     * park the created vCPU. shall be used during kvm_get_vcpu() when
+     * threads are created during realization of ARM vCPUs.
+     */
+    kvm_park_vcpu(cs);
+}
+
 /*
  * Update KVM's MP_STATE based on what QEMU thinks it is
  */
@@ -1876,7 +1908,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
         return -EINVAL;
     }
 
-    qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cpu);
+    /*
+     * Install VM change handler only when vCPU thread has been spawned
+     * i.e. vCPU is being realized
+     */
+    if (cs->thread_id) {
+        qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cpu);
+    }
 
     /* Determine init features for this CPU */
     memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 6a9b6374a6..ec9dc95ee8 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -98,6 +98,17 @@ bool kvm_arm_cpu_post_load(ARMCPU *cpu);
 void kvm_arm_reset_vcpu(ARMCPU *cpu);
 
 struct kvm_vcpu_init;
+
+/**
+ * kvm_arm_create_host_vcpu:
+ * @cpu: ARMCPU
+ *
+ * Called to pre-create possible KVM vCPU within the host during the
+ * `virt_machine` initialization phase. This pre-created vCPU will be parked and
+ * will be reused when ARM QOM vCPU is actually hotplugged.
+ */
+void kvm_arm_create_host_vcpu(ARMCPU *cpu);
+
 /**
  * kvm_arm_create_scratch_host_vcpu:
  * @fdarray: filled in with kvmfd, vmfd, cpufd file descriptors in that order
-- 
2.34.1


Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Gavin Shan 3 weeks, 2 days ago
Hi Salil,

On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
> From: Salil Mehta <salil.mehta@huawei.com>
> 
> ARM CPU architecture does not allow CPUs to be plugged after system has
> initialized. This is a constraint. Hence, the Kernel must know all the CPUs
> being booted during its initialization. This applies to the Guest Kernel as
> well and therefore, the number of KVM vCPU descriptors in the host must be
> fixed at VM initialization time.
> 
> Also, the GIC must know all the CPUs it is connected to during its
> initialization, and this cannot change afterward. This must also be ensured
> during the initialization of the VGIC in KVM. This is necessary because:
> 
> 1. The association between GICR and MPIDR must be fixed at VM initialization
>     time. This is represented by the register
>     `GICR_TYPER(mp_affinity, proc_num)`.
> 2. Memory regions associated with GICR, etc., cannot be changed (added,
>     deleted, or modified) after the VM has been initialized. This is not an
>     ARM architectural constraint but rather invites a difficult and messy
>     change in VGIC data structures.
> 
> To enable a hot-add–like model while preserving these constraints, the virt
> machine may enumerate more CPUs than are enabled at boot using
> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> administratively disabled at init). The topology remains fixed at VM
> creation time; only the online/offline status may change later.
> 
> Administratively disabled vCPUs are not realized in QOM until first enabled,
> avoiding creation of unnecessary vCPU threads at boot. On large systems, this
> reduces startup time proportionally to the number of disabled vCPUs. Once a
> QOM vCPU is realized and its thread created, subsequent enable/disable actions
> do not unrealize it. This behaviour was adopted following review feedback and
> differs from earlier RFC versions.
> 
> Co-developed-by: Keqian Zhu <zhuqian1@huawei.com>
> Signed-off-by: Keqian Zhu <zhuqian1@huawei.com>
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> ---
>   accel/kvm/kvm-all.c    |  2 +-
>   hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
>   hw/core/qdev.c         | 17 ++++++++++
>   include/hw/qdev-core.h | 19 +++++++++++
>   include/system/kvm.h   |  8 +++++
>   target/arm/cpu.c       |  2 ++
>   target/arm/kvm.c       | 40 +++++++++++++++++++++-
>   target/arm/kvm_arm.h   | 11 ++++++
>   8 files changed, 168 insertions(+), 8 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 890d5ea9f8..0e7d9d5c3d 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -460,7 +460,7 @@ static void kvm_reset_parked_vcpus(KVMState *s)
>    *
>    * @returns: 0 when success, errno (<0) when failed.
>    */
> -static int kvm_create_vcpu(CPUState *cpu)
> +int kvm_create_vcpu(CPUState *cpu)
>   {
>       unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
>       KVMState *s = kvm_state;
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 4ded19dc69..f4eeeacf6c 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2152,6 +2152,49 @@ static void virt_post_cpus_gic_realized(VirtMachineState *vms,
>       }
>   }
>   
> +static void
> +virt_setup_lazy_vcpu_realization(Object *cpuobj, VirtMachineState *vms)
> +{
> +    /*
> +     * Present & administratively disabled vCPUs:
> +     *
> +     * These CPUs are marked offline at init via '-smp disabledcpus=N'. We
> +     * intentionally do not realize them during the first boot, since it is
> +     * not known if or when they will ever be enabled. The decision to enable
> +     * such CPUs depends on policy (e.g. guided by SLAs or other deployment
> +     * requirements).
> +     *
> +     * Realizing all disabled vCPUs up front would make boot time proportional
> +     * to 'maxcpus', even if policy permits only a small subset to be enabled.
> +     * This can lead to unacceptable boot delays in some scenarios.
> +     *
> +     * Instead, these CPUs remain administratively disabled and unrealized at
> +     * boot, to be instantiated and brought online only if policy later allows
> +     * it.
> +     */
> +
> +    /* set this vCPU to be administratively 'disabled' in QOM */
> +    qdev_disable(DEVICE(cpuobj), NULL, &error_fatal);
> +
> +    if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED) {
> +        object_property_set_int(cpuobj, "psci-conduit", vms->psci_conduit,
> +                                NULL);
> +    }
> +
> +    /*
> +     * [!] Constraint: The ARM CPU architecture does not permit new CPUs
> +     * to be added after system initialization.
> +     *
> +     * Workaround: Pre-create KVM vCPUs even for those that are not yet
> +     * online i.e. powered-off, keeping them `parked` and in an
> +     * `unrealized (at-least during boot time)` state within QEMU until
> +     * they are powered-on and made online.
> +     */
> +    if (kvm_enabled()) {
> +        kvm_arm_create_host_vcpu(ARM_CPU(cpuobj));
> +    }
> +}
> +
>   static void machvirt_init(MachineState *machine)
>   {
>       VirtMachineState *vms = VIRT_MACHINE(machine);
> @@ -2319,10 +2362,6 @@ static void machvirt_init(MachineState *machine)
>           Object *cpuobj;
>           CPUState *cs;
>   
> -        if (n >= smp_cpus) {
> -            break;
> -        }
> -
>           cpuobj = object_new(possible_cpus->cpus[n].type);
>           object_property_set_int(cpuobj, "mp-affinity",
>                                   possible_cpus->cpus[n].arch_id, NULL);
> @@ -2427,8 +2466,34 @@ static void machvirt_init(MachineState *machine)
>               }
>           }
>   
> -        qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
> -        object_unref(cpuobj);
> +        /* start secondary vCPUs in a powered-down state */
> +        if(n && mc->has_online_capable_cpus) {
> +            object_property_set_bool(cpuobj, "start-powered-off", true, NULL);
> +        }
> +
> +        if (n < smp_cpus) {
> +            /* 'Present' & 'Enabled' vCPUs */
> +            qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
> +            object_unref(cpuobj);
> +        } else {
> +            /* 'Present' & 'Disabled' vCPUs */
> +            virt_setup_lazy_vcpu_realization(cpuobj, vms);
> +        }
> +
> +        /*
> +         * All possible vCPUs should have QOM vCPU Object pointer & arch-id.
> +         * 'cpus_queue' (accessed via qemu_get_cpu()) contains only realized and
> +         * enabled vCPUs. Hence, we must now populate the 'possible_cpus' list.
> +         */
> +        if (kvm_enabled()) {
> +            /*
> +             * Override the default architecture ID with the one retrieved
> +             * from KVM, as they currently differ.
> +             */
> +            machine->possible_cpus->cpus[n].arch_id =
> +                arm_cpu_mp_affinity(ARM_CPU(cs));
> +        }
> +        machine->possible_cpus->cpus[n].cpu = cs;
>       }
>   
>       /* Now we've created the CPUs we can see if they have the hypvirt timer */
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 8502d6216f..5816abae39 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -309,6 +309,23 @@ void qdev_assert_realized_properly(void)
>                                      qdev_assert_realized_properly_cb, NULL);
>   }
>   
> +bool qdev_disable(DeviceState *dev, BusState *bus, Error **errp)
> +{
> +    g_assert(dev);
> +
> +    if (bus) {
> +        error_setg(errp, "Device %s 'disable' operation not supported",
> +                   object_get_typename(OBJECT(dev)));
> +        return false;
> +    }
> +
> +    /* devices like cpu don't have bus */
> +    g_assert(!DEVICE_GET_CLASS(dev)->bus_type);
> +
> +    return object_property_set_str(OBJECT(dev), "admin_power_state", "disabled",
> +                                   errp);
> +}
> +
>   bool qdev_machine_modified(void)
>   {
>       return qdev_hot_added || qdev_hot_removed;
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 3bc212ab3a..2c22b32a3f 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -570,6 +570,25 @@ bool qdev_realize(DeviceState *dev, BusState *bus, Error **errp);
>    */
>   bool qdev_realize_and_unref(DeviceState *dev, BusState *bus, Error **errp);
>   
> +/**
> + * qdev_disable - Initiate administrative disablement and power-off of device
> + * @dev:   The device to be administratively powered off
> + * @bus:   The bus on which the device resides (may be NULL for CPUs)
> + * @errp:  Pointer to a location where an error can be reported
> + *
> + * This function initiates an administrative transition of the device into a
> + * DISABLED state. This may trigger a graceful shutdown process depending on
> + * platform capabilities. For ACPI platforms, this typically involves notifying
> + * the guest via events such as Notify(..., 0x03) and executing _EJx.
> + *
> + * Once completed, the device's operational power is turned off and it is
> + * marked as administratively DISABLED. Further guest usage is blocked until
> + * re-enabled by host-side policy.
> + *
> + * Returns true on success; false if an error occurs, with @errp populated.
> + */
> +bool qdev_disable(DeviceState *dev, BusState *bus, Error **errp);
> +
>   /**
>    * qdev_unrealize: Unrealize a device
>    * @dev: device to unrealize
> diff --git a/include/system/kvm.h b/include/system/kvm.h
> index 3c7d314736..4896a3c9c5 100644
> --- a/include/system/kvm.h
> +++ b/include/system/kvm.h
> @@ -317,6 +317,14 @@ int kvm_create_device(KVMState *s, uint64_t type, bool test);
>    */
>   bool kvm_device_supported(int vmfd, uint64_t type);
>   
> +/**
> + * kvm_create_vcpu - Gets a parked KVM vCPU or creates a KVM vCPU
> + * @cpu: QOM CPUState object for which KVM vCPU has to be fetched/created.
> + *
> + * @returns: 0 when success, errno (<0) when failed.
> + */
> +int kvm_create_vcpu(CPUState *cpu);
> +
>   /**
>    * kvm_park_vcpu - Park QEMU KVM vCPU context
>    * @cpu: QOM CPUState object for which QEMU KVM vCPU context has to be parked.
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 7e0d5b2ed8..a5906d1672 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1500,6 +1500,8 @@ static void arm_cpu_initfn(Object *obj)
>           /* TCG and HVF implement PSCI 1.1 */
>           cpu->psci_version = QEMU_PSCI_VERSION_1_1;
>       }
> +
> +    CPU(obj)->thread_id = 0;
>   }
>   
>   /*
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 6672344855..1962eb29b2 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -991,6 +991,38 @@ void kvm_arm_reset_vcpu(ARMCPU *cpu)
>       write_list_to_cpustate(cpu);
>   }
>   
> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> +{
> +    CPUState *cs = CPU(cpu);
> +    unsigned long vcpu_id = cs->cpu_index;
> +    int ret;
> +
> +    ret = kvm_create_vcpu(cs);
> +    if (ret < 0) {
> +        error_report("Failed to create host vcpu %ld", vcpu_id);
> +        abort();
> +    }
> +
> +    /*
> +     * Initialize the vCPU in the host. This will reset the sys regs
> +     * for this vCPU and related registers like MPIDR_EL1 etc. also
> +     * get programmed during this call to host. These are referenced
> +     * later while setting device attributes of the GICR during GICv3
> +     * reset.
> +     */
> +    ret = kvm_arch_init_vcpu(cs);
> +    if (ret < 0) {
> +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
> +        abort();
> +    }
> +
> +    /*
> +     * park the created vCPU. shall be used during kvm_get_vcpu() when
> +     * threads are created during realization of ARM vCPUs.
> +     */
> +    kvm_park_vcpu(cs);
> +}
> +

I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
SVE is supported by default.

kvm_arch_init_vcpu() is supposed to be called in the realization path in current
implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
is populated at vCPU realization time.

$ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
   --enable-kvm -machine virt,gic-version=3 -cpu host               \
   -smp cpus=4,disabledcpus=2 -m 1024M                              \
   -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
   -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
qemu-system-aarch64: Failed to initialize host vcpu 4
Aborted (core dumped)

Backtrace
=========
(gdb) bt
#0  0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
#2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
#3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
     at ../target/arm/kvm.c:1081
#4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
     at ../hw/arm/virt.c:2483
#5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
#6  0x0000aaaab160f220 in machine_run_board_init
     (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
#7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
#8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
     at ../system/vl.c:2821
#9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
#10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71

Thanks,
Gavin

>   /*
>    * Update KVM's MP_STATE based on what QEMU thinks it is
>    */
> @@ -1876,7 +1908,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
>           return -EINVAL;
>       }
>   
> -    qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cpu);
> +    /*
> +     * Install VM change handler only when vCPU thread has been spawned
> +     * i.e. vCPU is being realized
> +     */
> +    if (cs->thread_id) {
> +        qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cpu);
> +    }
>   
>       /* Determine init features for this CPU */
>       memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
> diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
> index 6a9b6374a6..ec9dc95ee8 100644
> --- a/target/arm/kvm_arm.h
> +++ b/target/arm/kvm_arm.h
> @@ -98,6 +98,17 @@ bool kvm_arm_cpu_post_load(ARMCPU *cpu);
>   void kvm_arm_reset_vcpu(ARMCPU *cpu);
>   
>   struct kvm_vcpu_init;
> +
> +/**
> + * kvm_arm_create_host_vcpu:
> + * @cpu: ARMCPU
> + *
> + * Called to pre-create possible KVM vCPU within the host during the
> + * `virt_machine` initialization phase. This pre-created vCPU will be parked and
> + * will be reused when ARM QOM vCPU is actually hotplugged.
> + */
> +void kvm_arm_create_host_vcpu(ARMCPU *cpu);
> +
>   /**
>    * kvm_arm_create_scratch_host_vcpu:
>    * @fdarray: filled in with kvmfd, vmfd, cpufd file descriptors in that order


Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Salil Mehta 3 weeks, 2 days ago
Hi Gavin,

On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Salil,
>
> On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
> > From: Salil Mehta <salil.mehta@huawei.com>
> >
> > ARM CPU architecture does not allow CPUs to be plugged after system has
> > initialized. This is a constraint. Hence, the Kernel must know all the CPUs
> > being booted during its initialization. This applies to the Guest Kernel as
> > well and therefore, the number of KVM vCPU descriptors in the host must be
> > fixed at VM initialization time.
> >
> > Also, the GIC must know all the CPUs it is connected to during its
> > initialization, and this cannot change afterward. This must also be ensured
> > during the initialization of the VGIC in KVM. This is necessary because:
> >
> > 1. The association between GICR and MPIDR must be fixed at VM initialization
> >     time. This is represented by the register
> >     `GICR_TYPER(mp_affinity, proc_num)`.
> > 2. Memory regions associated with GICR, etc., cannot be changed (added,
> >     deleted, or modified) after the VM has been initialized. This is not an
> >     ARM architectural constraint but rather invites a difficult and messy
> >     change in VGIC data structures.
> >
> > To enable a hot-add–like model while preserving these constraints, the virt
> > machine may enumerate more CPUs than are enabled at boot using
> > `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> > administratively disabled at init). The topology remains fixed at VM
> > creation time; only the online/offline status may change later.
> >
> > Administratively disabled vCPUs are not realized in QOM until first enabled,
> > avoiding creation of unnecessary vCPU threads at boot. On large systems, this
> > reduces startup time proportionally to the number of disabled vCPUs. Once a
> > QOM vCPU is realized and its thread created, subsequent enable/disable actions
> > do not unrealize it. This behaviour was adopted following review feedback and
> > differs from earlier RFC versions.
> >
> > Co-developed-by: Keqian Zhu <zhuqian1@huawei.com>
> > Signed-off-by: Keqian Zhu <zhuqian1@huawei.com>
> > Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > ---
> >   accel/kvm/kvm-all.c    |  2 +-
> >   hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
> >   hw/core/qdev.c         | 17 ++++++++++
> >   include/hw/qdev-core.h | 19 +++++++++++
> >   include/system/kvm.h   |  8 +++++
> >   target/arm/cpu.c       |  2 ++
> >   target/arm/kvm.c       | 40 +++++++++++++++++++++-
> >   target/arm/kvm_arm.h   | 11 ++++++
> >   8 files changed, 168 insertions(+), 8 deletions(-)
> >
[...]
> >
> > +static void
> > +virt_setup_lazy_vcpu_realization(Object *cpuobj, VirtMachineState *vms)
> > +{
> > +    /*
> > +     * Present & administratively disabled vCPUs:
> > +     *
> > +     * These CPUs are marked offline at init via '-smp disabledcpus=N'. We
> > +     * intentionally do not realize them during the first boot, since it is
> > +     * not known if or when they will ever be enabled. The decision to enable
> > +     * such CPUs depends on policy (e.g. guided by SLAs or other deployment
> > +     * requirements).
> > +     *
> > +     * Realizing all disabled vCPUs up front would make boot time proportional
> > +     * to 'maxcpus', even if policy permits only a small subset to be enabled.
> > +     * This can lead to unacceptable boot delays in some scenarios.
> > +     *
> > +     * Instead, these CPUs remain administratively disabled and unrealized at
> > +     * boot, to be instantiated and brought online only if policy later allows
> > +     * it.
> > +     */
> > +
> > +    /* set this vCPU to be administratively 'disabled' in QOM */
> > +    qdev_disable(DEVICE(cpuobj), NULL, &error_fatal);
> > +
> > +    if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED) {
> > +        object_property_set_int(cpuobj, "psci-conduit", vms->psci_conduit,
> > +                                NULL);
> > +    }
> > +
> > +    /*
> > +     * [!] Constraint: The ARM CPU architecture does not permit new CPUs
> > +     * to be added after system initialization.
> > +     *
> > +     * Workaround: Pre-create KVM vCPUs even for those that are not yet
> > +     * online i.e. powered-off, keeping them `parked` and in an
> > +     * `unrealized (at-least during boot time)` state within QEMU until
> > +     * they are powered-on and made online.
> > +     */
> > +    if (kvm_enabled()) {
> > +        kvm_arm_create_host_vcpu(ARM_CPU(cpuobj));
> > +    }
> > +}
> > +
> >   static void machvirt_init(MachineState *machine)
> >   {
> >       VirtMachineState *vms = VIRT_MACHINE(machine);
> > @@ -2319,10 +2362,6 @@ static void machvirt_init(MachineState *machine)
> >           Object *cpuobj;
> >           CPUState *cs;
> >
> > -        if (n >= smp_cpus) {
> > -            break;
> > -        }
> > -
> >           cpuobj = object_new(possible_cpus->cpus[n].type);
> >           object_property_set_int(cpuobj, "mp-affinity",
> >                                   possible_cpus->cpus[n].arch_id, NULL);
> > @@ -2427,8 +2466,34 @@ static void machvirt_init(MachineState *machine)
> >               }
> >           }
> >
> > -        qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
> > -        object_unref(cpuobj);
> > +        /* start secondary vCPUs in a powered-down state */
> > +        if(n && mc->has_online_capable_cpus) {
> > +            object_property_set_bool(cpuobj, "start-powered-off", true, NULL);
> > +        }
> > +
> > +        if (n < smp_cpus) {
> > +            /* 'Present' & 'Enabled' vCPUs */
> > +            qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
> > +            object_unref(cpuobj);
> > +        } else {
> > +            /* 'Present' & 'Disabled' vCPUs */
> > +            virt_setup_lazy_vcpu_realization(cpuobj, vms);
> > +        }
> > +
> > +        /*
> > +         * All possible vCPUs should have QOM vCPU Object pointer & arch-id.
> > +         * 'cpus_queue' (accessed via qemu_get_cpu()) contains only realized and
> > +         * enabled vCPUs. Hence, we must now populate the 'possible_cpus' list.
> > +         */
> > +        if (kvm_enabled()) {
> > +            /*
> > +             * Override the default architecture ID with the one retrieved
> > +             * from KVM, as they currently differ.
> > +             */
> > +            machine->possible_cpus->cpus[n].arch_id =
> > +                arm_cpu_mp_affinity(ARM_CPU(cs));
> > +        }
> > +        machine->possible_cpus->cpus[n].cpu = cs;
> >       }
> >
> >       /* Now we've created the CPUs we can see if they have the hypvirt timer */
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 8502d6216f..5816abae39 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -309,6 +309,23 @@ void qdev_assert_realized_properly(void)
> >                                      qdev_assert_realized_properly_cb, NULL);
> >   }
> >

[...]

> > +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> > +{
> > +    CPUState *cs = CPU(cpu);
> > +    unsigned long vcpu_id = cs->cpu_index;
> > +    int ret;
> > +
> > +    ret = kvm_create_vcpu(cs);
> > +    if (ret < 0) {
> > +        error_report("Failed to create host vcpu %ld", vcpu_id);
> > +        abort();
> > +    }
> > +
> > +    /*
> > +     * Initialize the vCPU in the host. This will reset the sys regs
> > +     * for this vCPU and related registers like MPIDR_EL1 etc. also
> > +     * get programmed during this call to host. These are referenced
> > +     * later while setting device attributes of the GICR during GICv3
> > +     * reset.
> > +     */
> > +    ret = kvm_arch_init_vcpu(cs);
> > +    if (ret < 0) {
> > +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
> > +        abort();
> > +    }
> > +
> > +    /*
> > +     * park the created vCPU. shall be used during kvm_get_vcpu() when
> > +     * threads are created during realization of ARM vCPUs.
> > +     */
> > +    kvm_park_vcpu(cs);
> > +}
> > +
>
> I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
> SVE is supported by default.

Thanks for reporting this. That is not true. As long as we initialize
KVM correctly and
finalize the features like SVE we should be fine. In fact, this is
precisely what we are
doing right now.

To understand the crash, I need a bit more info.

1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you check
      within the KVM if it is happening because
     a.  features specified by QEMU are not matching the defaults within the KVM
           (HInt: check kvm_vcpu_init_check_features())?
     b. or complaining about init feate change kvm_vcpu_init_changed()?
2#  or it is happening during the setting of vector length or
finalizing features?

int kvm_arch_init_vcpu(CPUState *cs)
{
   [...]
         /* Do KVM_ARM_VCPU_INIT ioctl */
        ret = kvm_arm_vcpu_init(cpu);   ---->[1]
        if (ret) {
           return ret;
       }
          if (cpu_isar_feature(aa64_sve, cpu)) {
        ret = kvm_arm_sve_set_vls(cpu); ---->[2]
        if (ret) {
            return ret;
        }
        ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
        if (ret) {
            return ret;
        }
    }
[...]
}

I think it's happening because vector length is going uninitialized.
This initialization
happens in context to  arm_cpu_finalize_features() which I forgot to call before
calling KVM finalize.

>
> kvm_arch_init_vcpu() is supposed to be called in the realization path in current
> implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
> is populated at vCPU realization time.

Not necessarily. It is just meant to initialize the KVM. If we take care of the
KVM requirements in the similar way the realize path does we should be
fine. Can you try to add the patch below in your code and test if it works?

 diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index c4b68a0b17..1091593478 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
         abort();
     }

+     /* finalize the features like SVE, SME etc */
+     arm_cpu_finalize_features(cpu, &error_abort);
+
     /*
      * Initialize the vCPU in the host. This will reset the sys regs
      * for this vCPU and related registers like MPIDR_EL1 etc. also




>
> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
>    --enable-kvm -machine virt,gic-version=3 -cpu host               \
>    -smp cpus=4,disabledcpus=2 -m 1024M                              \
>    -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
>    -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
> qemu-system-aarch64: Failed to initialize host vcpu 4
> Aborted (core dumped)
>
> Backtrace
> =========
> (gdb) bt
> #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
> #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
> #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
> #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
>      at ../target/arm/kvm.c:1081
> #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
>      at ../hw/arm/virt.c:2483
> #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
> #6  0x0000aaaab160f220 in machine_run_board_init
>      (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
> #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
> #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
>      at ../system/vl.c:2821
> #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71


Thank you for this. Please let me know if the above fix works and also
the return values in
case you encounter errors.

Many thanks!
Salil.


>
> Thanks,
> Gavin
>
> >   /*
> >    * Update KVM's MP_STATE based on what QEMU thinks it is
> >    */
> > @@ -1876,7 +1908,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >           return -EINVAL;
> >       }
> >
> > -    qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cpu);
> > +    /*
> > +     * Install VM change handler only when vCPU thread has been spawned
> > +     * i.e. vCPU is being realized
> > +     */
> > +    if (cs->thread_id) {
> > +        qemu_add_vm_change_state_handler(kvm_arm_vm_state_change, cpu);
> > +    }
> >
> >       /* Determine init features for this CPU */
> >       memset(cpu->kvm_init_features, 0, sizeof(cpu->kvm_init_features));
> > diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
> > index 6a9b6374a6..ec9dc95ee8 100644
> > --- a/target/arm/kvm_arm.h
> > +++ b/target/arm/kvm_arm.h
> > @@ -98,6 +98,17 @@ bool kvm_arm_cpu_post_load(ARMCPU *cpu);
> >   void kvm_arm_reset_vcpu(ARMCPU *cpu);
> >
> >   struct kvm_vcpu_init;
> > +
> > +/**
> > + * kvm_arm_create_host_vcpu:
> > + * @cpu: ARMCPU
> > + *
> > + * Called to pre-create possible KVM vCPU within the host during the
> > + * `virt_machine` initialization phase. This pre-created vCPU will be parked and
> > + * will be reused when ARM QOM vCPU is actually hotplugged.
> > + */
> > +void kvm_arm_create_host_vcpu(ARMCPU *cpu);
> > +
> >   /**
> >    * kvm_arm_create_scratch_host_vcpu:
> >    * @fdarray: filled in with kvmfd, vmfd, cpufd file descriptors in that order
>
Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Salil Mehta 3 weeks, 2 days ago
Hi Gavin,

On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <salil.mehta@opnsrc.net> wrote:
>
> Hi Gavin,
>
> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
> >
> > Hi Salil,
> >
> > On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
> > > From: Salil Mehta <salil.mehta@huawei.com>
> > >
> > > ARM CPU architecture does not allow CPUs to be plugged after system has
> > > initialized. This is a constraint. Hence, the Kernel must know all the CPUs
> > > being booted during its initialization. This applies to the Guest Kernel as
> > > well and therefore, the number of KVM vCPU descriptors in the host must be
> > > fixed at VM initialization time.
> > >
> > > Also, the GIC must know all the CPUs it is connected to during its
> > > initialization, and this cannot change afterward. This must also be ensured
> > > during the initialization of the VGIC in KVM. This is necessary because:
> > >
> > > 1. The association between GICR and MPIDR must be fixed at VM initialization
> > >     time. This is represented by the register
> > >     `GICR_TYPER(mp_affinity, proc_num)`.
> > > 2. Memory regions associated with GICR, etc., cannot be changed (added,
> > >     deleted, or modified) after the VM has been initialized. This is not an
> > >     ARM architectural constraint but rather invites a difficult and messy
> > >     change in VGIC data structures.
> > >
> > > To enable a hot-add–like model while preserving these constraints, the virt
> > > machine may enumerate more CPUs than are enabled at boot using
> > > `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> > > administratively disabled at init). The topology remains fixed at VM
> > > creation time; only the online/offline status may change later.
> > >
> > > Administratively disabled vCPUs are not realized in QOM until first enabled,
> > > avoiding creation of unnecessary vCPU threads at boot. On large systems, this
> > > reduces startup time proportionally to the number of disabled vCPUs. Once a
> > > QOM vCPU is realized and its thread created, subsequent enable/disable actions
> > > do not unrealize it. This behaviour was adopted following review feedback and
> > > differs from earlier RFC versions.
> > >
> > > Co-developed-by: Keqian Zhu <zhuqian1@huawei.com>
> > > Signed-off-by: Keqian Zhu <zhuqian1@huawei.com>
> > > Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > > ---
> > >   accel/kvm/kvm-all.c    |  2 +-
> > >   hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
> > >   hw/core/qdev.c         | 17 ++++++++++
> > >   include/hw/qdev-core.h | 19 +++++++++++
> > >   include/system/kvm.h   |  8 +++++
> > >   target/arm/cpu.c       |  2 ++
> > >   target/arm/kvm.c       | 40 +++++++++++++++++++++-
> > >   target/arm/kvm_arm.h   | 11 ++++++
> > >   8 files changed, 168 insertions(+), 8 deletions(-)
> > >

[...]

> > > +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> > > +{
> > > +    CPUState *cs = CPU(cpu);
> > > +    unsigned long vcpu_id = cs->cpu_index;
> > > +    int ret;
> > > +
> > > +    ret = kvm_create_vcpu(cs);
> > > +    if (ret < 0) {
> > > +        error_report("Failed to create host vcpu %ld", vcpu_id);
> > > +        abort();
> > > +    }
> > > +
> > > +    /*
> > > +     * Initialize the vCPU in the host. This will reset the sys regs
> > > +     * for this vCPU and related registers like MPIDR_EL1 etc. also
> > > +     * get programmed during this call to host. These are referenced
> > > +     * later while setting device attributes of the GICR during GICv3
> > > +     * reset.
> > > +     */
> > > +    ret = kvm_arch_init_vcpu(cs);
> > > +    if (ret < 0) {
> > > +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
> > > +        abort();
> > > +    }
> > > +
> > > +    /*
> > > +     * park the created vCPU. shall be used during kvm_get_vcpu() when
> > > +     * threads are created during realization of ARM vCPUs.
> > > +     */
> > > +    kvm_park_vcpu(cs);
> > > +}
> > > +
> >
> > I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
> > path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
> > SVE is supported by default.
>
> Thanks for reporting this. That is not true. As long as we initialize
> KVM correctly and
> finalize the features like SVE we should be fine. In fact, this is
> precisely what we are
> doing right now.
>
> To understand the crash, I need a bit more info.
>
> 1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you check
>       within the KVM if it is happening because
>      a.  features specified by QEMU are not matching the defaults within the KVM
>            (HInt: check kvm_vcpu_init_check_features())?
>      b. or complaining about init feate change kvm_vcpu_init_changed()?
> 2#  or it is happening during the setting of vector length or
> finalizing features?
>
> int kvm_arch_init_vcpu(CPUState *cs)
> {
>    [...]
>          /* Do KVM_ARM_VCPU_INIT ioctl */
>         ret = kvm_arm_vcpu_init(cpu);   ---->[1]
>         if (ret) {
>            return ret;
>        }
>           if (cpu_isar_feature(aa64_sve, cpu)) {
>         ret = kvm_arm_sve_set_vls(cpu); ---->[2]
>         if (ret) {
>             return ret;
>         }
>         ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
>         if (ret) {
>             return ret;
>         }
>     }
> [...]
> }
>
> I think it's happening because vector length is going uninitialized.
> This initialization
> happens in context to  arm_cpu_finalize_features() which I forgot to call before
> calling KVM finalize.
>
> >
> > kvm_arch_init_vcpu() is supposed to be called in the realization path in current
> > implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
> > is populated at vCPU realization time.
>
> Not necessarily. It is just meant to initialize the KVM. If we take care of the
> KVM requirements in the similar way the realize path does we should be
> fine. Can you try to add the patch below in your code and test if it works?
>
>  diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index c4b68a0b17..1091593478 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
>          abort();
>      }
>
> +     /* finalize the features like SVE, SME etc */
> +     arm_cpu_finalize_features(cpu, &error_abort);
> +
>      /*
>       * Initialize the vCPU in the host. This will reset the sys regs
>       * for this vCPU and related registers like MPIDR_EL1 etc. also
>
>
>
>
> >
> > $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
> >    --enable-kvm -machine virt,gic-version=3 -cpu host               \
> >    -smp cpus=4,disabledcpus=2 -m 1024M                              \
> >    -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
> >    -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
> > qemu-system-aarch64: Failed to initialize host vcpu 4
> > Aborted (core dumped)
> >
> > Backtrace
> > =========
> > (gdb) bt
> > #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
> > #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
> > #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
> > #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
> >      at ../target/arm/kvm.c:1081
> > #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
> >      at ../hw/arm/virt.c:2483
> > #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
> > #6  0x0000aaaab160f220 in machine_run_board_init
> >      (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
> > #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
> > #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
> >      at ../system/vl.c:2821
> > #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
> > #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71
>
>
> Thank you for this. Please let me know if the above fix works and also
> the return values in
> case you encounter errors.

I've pushed the fix to below branch for your convenience:

Branch: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
Fix: https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669

Thanks
Salil.
Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Gavin Shan 3 weeks, 1 day ago
Hi Salil,

On 10/23/25 4:50 AM, Salil Mehta wrote:
> On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <salil.mehta@opnsrc.net> wrote:
>> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
>>>
>>> Hi Salil,
>>>
>>> On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
>>>> From: Salil Mehta <salil.mehta@huawei.com>
>>>>
>>>> ARM CPU architecture does not allow CPUs to be plugged after system has
>>>> initialized. This is a constraint. Hence, the Kernel must know all the CPUs
>>>> being booted during its initialization. This applies to the Guest Kernel as
>>>> well and therefore, the number of KVM vCPU descriptors in the host must be
>>>> fixed at VM initialization time.
>>>>
>>>> Also, the GIC must know all the CPUs it is connected to during its
>>>> initialization, and this cannot change afterward. This must also be ensured
>>>> during the initialization of the VGIC in KVM. This is necessary because:
>>>>
>>>> 1. The association between GICR and MPIDR must be fixed at VM initialization
>>>>      time. This is represented by the register
>>>>      `GICR_TYPER(mp_affinity, proc_num)`.
>>>> 2. Memory regions associated with GICR, etc., cannot be changed (added,
>>>>      deleted, or modified) after the VM has been initialized. This is not an
>>>>      ARM architectural constraint but rather invites a difficult and messy
>>>>      change in VGIC data structures.
>>>>
>>>> To enable a hot-add–like model while preserving these constraints, the virt
>>>> machine may enumerate more CPUs than are enabled at boot using
>>>> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
>>>> administratively disabled at init). The topology remains fixed at VM
>>>> creation time; only the online/offline status may change later.
>>>>
>>>> Administratively disabled vCPUs are not realized in QOM until first enabled,
>>>> avoiding creation of unnecessary vCPU threads at boot. On large systems, this
>>>> reduces startup time proportionally to the number of disabled vCPUs. Once a
>>>> QOM vCPU is realized and its thread created, subsequent enable/disable actions
>>>> do not unrealize it. This behaviour was adopted following review feedback and
>>>> differs from earlier RFC versions.
>>>>
>>>> Co-developed-by: Keqian Zhu <zhuqian1@huawei.com>
>>>> Signed-off-by: Keqian Zhu <zhuqian1@huawei.com>
>>>> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
>>>> ---
>>>>    accel/kvm/kvm-all.c    |  2 +-
>>>>    hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
>>>>    hw/core/qdev.c         | 17 ++++++++++
>>>>    include/hw/qdev-core.h | 19 +++++++++++
>>>>    include/system/kvm.h   |  8 +++++
>>>>    target/arm/cpu.c       |  2 ++
>>>>    target/arm/kvm.c       | 40 +++++++++++++++++++++-
>>>>    target/arm/kvm_arm.h   | 11 ++++++
>>>>    8 files changed, 168 insertions(+), 8 deletions(-)
>>>>
> 
> [...]
> 
>>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
>>>> +{
>>>> +    CPUState *cs = CPU(cpu);
>>>> +    unsigned long vcpu_id = cs->cpu_index;
>>>> +    int ret;
>>>> +
>>>> +    ret = kvm_create_vcpu(cs);
>>>> +    if (ret < 0) {
>>>> +        error_report("Failed to create host vcpu %ld", vcpu_id);
>>>> +        abort();
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Initialize the vCPU in the host. This will reset the sys regs
>>>> +     * for this vCPU and related registers like MPIDR_EL1 etc. also
>>>> +     * get programmed during this call to host. These are referenced
>>>> +     * later while setting device attributes of the GICR during GICv3
>>>> +     * reset.
>>>> +     */
>>>> +    ret = kvm_arch_init_vcpu(cs);
>>>> +    if (ret < 0) {
>>>> +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
>>>> +        abort();
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * park the created vCPU. shall be used during kvm_get_vcpu() when
>>>> +     * threads are created during realization of ARM vCPUs.
>>>> +     */
>>>> +    kvm_park_vcpu(cs);
>>>> +}
>>>> +
>>>
>>> I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
>>> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
>>> SVE is supported by default.
>>
>> Thanks for reporting this. That is not true. As long as we initialize
>> KVM correctly and
>> finalize the features like SVE we should be fine. In fact, this is
>> precisely what we are
>> doing right now.
>>
>> To understand the crash, I need a bit more info.
>>
>> 1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you check
>>        within the KVM if it is happening because
>>       a.  features specified by QEMU are not matching the defaults within the KVM
>>             (HInt: check kvm_vcpu_init_check_features())?
>>       b. or complaining about init feate change kvm_vcpu_init_changed()?
>> 2#  or it is happening during the setting of vector length or
>> finalizing features?
>>
>> int kvm_arch_init_vcpu(CPUState *cs)
>> {
>>     [...]
>>           /* Do KVM_ARM_VCPU_INIT ioctl */
>>          ret = kvm_arm_vcpu_init(cpu);   ---->[1]
>>          if (ret) {
>>             return ret;
>>         }
>>            if (cpu_isar_feature(aa64_sve, cpu)) {
>>          ret = kvm_arm_sve_set_vls(cpu); ---->[2]
>>          if (ret) {
>>              return ret;
>>          }
>>          ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
>>          if (ret) {
>>              return ret;
>>          }
>>      }
>> [...]
>> }
>>
>> I think it's happening because vector length is going uninitialized.
>> This initialization
>> happens in context to  arm_cpu_finalize_features() which I forgot to call before
>> calling KVM finalize.
>>
>>>
>>> kvm_arch_init_vcpu() is supposed to be called in the realization path in current
>>> implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
>>> is populated at vCPU realization time.
>>
>> Not necessarily. It is just meant to initialize the KVM. If we take care of the
>> KVM requirements in the similar way the realize path does we should be
>> fine. Can you try to add the patch below in your code and test if it works?
>>
>>   diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>> index c4b68a0b17..1091593478 100644
>> --- a/target/arm/kvm.c
>> +++ b/target/arm/kvm.c
>> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
>>           abort();
>>       }
>>
>> +     /* finalize the features like SVE, SME etc */
>> +     arm_cpu_finalize_features(cpu, &error_abort);
>> +
>>       /*
>>        * Initialize the vCPU in the host. This will reset the sys regs
>>        * for this vCPU and related registers like MPIDR_EL1 etc. also
>>
>>
>>
>>
>>>
>>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
>>>     --enable-kvm -machine virt,gic-version=3 -cpu host               \
>>>     -smp cpus=4,disabledcpus=2 -m 1024M                              \
>>>     -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
>>>     -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
>>> qemu-system-aarch64: Failed to initialize host vcpu 4
>>> Aborted (core dumped)
>>>
>>> Backtrace
>>> =========
>>> (gdb) bt
>>> #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
>>> #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
>>> #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
>>> #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
>>>       at ../target/arm/kvm.c:1081
>>> #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
>>>       at ../hw/arm/virt.c:2483
>>> #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
>>> #6  0x0000aaaab160f220 in machine_run_board_init
>>>       (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
>>> #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
>>> #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
>>>       at ../system/vl.c:2821
>>> #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
>>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71
>>
>>
>> Thank you for this. Please let me know if the above fix works and also
>> the return values in
>> case you encounter errors.
> 
> I've pushed the fix to below branch for your convenience:
> 
> Branch: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
> Fix: https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
> 

I guess rfc-v6.2 branch isn't ready for test because it runs into another crash
dump with rfc-v6.2 branch, like below.

host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
       -accel kvm -machine virt,gic-version=host,nvdimm=on                         \
       -cpu host,sve=on                                                            \
       -smp maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
       -m 4096M,slots=16,maxmem=128G                                               \
       -object memory-backend-ram,id=mem0,size=2048M                               \
       -object memory-backend-ram,id=mem1,size=2048M                               \
       -numa node,nodeid=0,memdev=mem0,cpus=0-1                                    \
       -numa node,nodeid=1,memdev=mem1,cpus=2-3                                    \
       -L /home/gavin/sandbox/qemu.main/build/pc-bios                              \
       -monitor none -serial mon:stdio -nographic -gdb tcp::6666                   \
       -qmp tcp:localhost:5555,server,wait=off                                     \
       -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd      \
       -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
       -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
       -append memhp_default_state=online_movable
         :
         :
guest$ cd /sys/devices/system/cpu/
guest$ cat present enabled online
0-3
0-1
0-1
(qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation not permitted

I picked the fix (the last patch in rfc-v6.2 branch) to rfc-v6 branch, same crash dump
can be seen.

root@nvidia-grace-hopper-01:/home/gavin/sandbox/qemu.main# git log --oneline HEAD | head -n 1
82dbd9a8f6 tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc
root@nvidia-grace-hopper-01:/home/gavin/sandbox/qemu.main# git diff
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 254303727b..c4f89e7db6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2470,6 +2470,9 @@ virt_setup_lazy_vcpu_realization(Object *cpuobj, VirtMachineState *vms)
      /* set operational state of disabled CPUs as OFF */
      ARM_CPU(cpuobj)->power_state = PSCI_OFF;
  
+    /* finalize the features like SVE, SME etc */
+    arm_cpu_finalize_features(ARM_CPU(cpuobj), &error_fatal);
+

Thanks,
Gavin


Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Salil Mehta 3 weeks, 1 day ago
HI Gavin,

On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Salil,
>
> On 10/23/25 4:50 AM, Salil Mehta wrote:
> > On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <salil.mehta@opnsrc.net> wrote:
> >> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
> >>>
> >>> Hi Salil,
> >>>
> >>> On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
> >>>> From: Salil Mehta <salil.mehta@huawei.com>
> >>>>
> >>>> ARM CPU architecture does not allow CPUs to be plugged after system has
> >>>> initialized. This is a constraint. Hence, the Kernel must know all the CPUs
> >>>> being booted during its initialization. This applies to the Guest Kernel as
> >>>> well and therefore, the number of KVM vCPU descriptors in the host must be
> >>>> fixed at VM initialization time.
> >>>>
> >>>> Also, the GIC must know all the CPUs it is connected to during its
> >>>> initialization, and this cannot change afterward. This must also be ensured
> >>>> during the initialization of the VGIC in KVM. This is necessary because:
> >>>>
> >>>> 1. The association between GICR and MPIDR must be fixed at VM initialization
> >>>>      time. This is represented by the register
> >>>>      `GICR_TYPER(mp_affinity, proc_num)`.
> >>>> 2. Memory regions associated with GICR, etc., cannot be changed (added,
> >>>>      deleted, or modified) after the VM has been initialized. This is not an
> >>>>      ARM architectural constraint but rather invites a difficult and messy
> >>>>      change in VGIC data structures.
> >>>>
> >>>> To enable a hot-add–like model while preserving these constraints, the virt
> >>>> machine may enumerate more CPUs than are enabled at boot using
> >>>> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> >>>> administratively disabled at init). The topology remains fixed at VM
> >>>> creation time; only the online/offline status may change later.
> >>>>
> >>>> Administratively disabled vCPUs are not realized in QOM until first enabled,
> >>>> avoiding creation of unnecessary vCPU threads at boot. On large systems, this
> >>>> reduces startup time proportionally to the number of disabled vCPUs. Once a
> >>>> QOM vCPU is realized and its thread created, subsequent enable/disable actions
> >>>> do not unrealize it. This behaviour was adopted following review feedback and
> >>>> differs from earlier RFC versions.
> >>>>
> >>>> Co-developed-by: Keqian Zhu <zhuqian1@huawei.com>
> >>>> Signed-off-by: Keqian Zhu <zhuqian1@huawei.com>
> >>>> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> >>>> ---
> >>>>    accel/kvm/kvm-all.c    |  2 +-
> >>>>    hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
> >>>>    hw/core/qdev.c         | 17 ++++++++++
> >>>>    include/hw/qdev-core.h | 19 +++++++++++
> >>>>    include/system/kvm.h   |  8 +++++
> >>>>    target/arm/cpu.c       |  2 ++
> >>>>    target/arm/kvm.c       | 40 +++++++++++++++++++++-
> >>>>    target/arm/kvm_arm.h   | 11 ++++++
> >>>>    8 files changed, 168 insertions(+), 8 deletions(-)
> >>>>
> >
> > [...]
> >
> >>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> >>>> +{
> >>>> +    CPUState *cs = CPU(cpu);
> >>>> +    unsigned long vcpu_id = cs->cpu_index;
> >>>> +    int ret;
> >>>> +
> >>>> +    ret = kvm_create_vcpu(cs);
> >>>> +    if (ret < 0) {
> >>>> +        error_report("Failed to create host vcpu %ld", vcpu_id);
> >>>> +        abort();
> >>>> +    }
> >>>> +
> >>>> +    /*
> >>>> +     * Initialize the vCPU in the host. This will reset the sys regs
> >>>> +     * for this vCPU and related registers like MPIDR_EL1 etc. also
> >>>> +     * get programmed during this call to host. These are referenced
> >>>> +     * later while setting device attributes of the GICR during GICv3
> >>>> +     * reset.
> >>>> +     */
> >>>> +    ret = kvm_arch_init_vcpu(cs);
> >>>> +    if (ret < 0) {
> >>>> +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
> >>>> +        abort();
> >>>> +    }
> >>>> +
> >>>> +    /*
> >>>> +     * park the created vCPU. shall be used during kvm_get_vcpu() when
> >>>> +     * threads are created during realization of ARM vCPUs.
> >>>> +     */
> >>>> +    kvm_park_vcpu(cs);
> >>>> +}
> >>>> +
> >>>
> >>> I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
> >>> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
> >>> SVE is supported by default.
> >>
> >> Thanks for reporting this. That is not true. As long as we initialize
> >> KVM correctly and
> >> finalize the features like SVE we should be fine. In fact, this is
> >> precisely what we are
> >> doing right now.
> >>
> >> To understand the crash, I need a bit more info.
> >>
> >> 1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you check
> >>        within the KVM if it is happening because
> >>       a.  features specified by QEMU are not matching the defaults within the KVM
> >>             (HInt: check kvm_vcpu_init_check_features())?
> >>       b. or complaining about init feate change kvm_vcpu_init_changed()?
> >> 2#  or it is happening during the setting of vector length or
> >> finalizing features?
> >>
> >> int kvm_arch_init_vcpu(CPUState *cs)
> >> {
> >>     [...]
> >>           /* Do KVM_ARM_VCPU_INIT ioctl */
> >>          ret = kvm_arm_vcpu_init(cpu);   ---->[1]
> >>          if (ret) {
> >>             return ret;
> >>         }
> >>            if (cpu_isar_feature(aa64_sve, cpu)) {
> >>          ret = kvm_arm_sve_set_vls(cpu); ---->[2]
> >>          if (ret) {
> >>              return ret;
> >>          }
> >>          ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
> >>          if (ret) {
> >>              return ret;
> >>          }
> >>      }
> >> [...]
> >> }
> >>
> >> I think it's happening because vector length is going uninitialized.
> >> This initialization
> >> happens in context to  arm_cpu_finalize_features() which I forgot to call before
> >> calling KVM finalize.
> >>
> >>>
> >>> kvm_arch_init_vcpu() is supposed to be called in the realization path in current
> >>> implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
> >>> is populated at vCPU realization time.
> >>
> >> Not necessarily. It is just meant to initialize the KVM. If we take care of the
> >> KVM requirements in the similar way the realize path does we should be
> >> fine. Can you try to add the patch below in your code and test if it works?
> >>
> >>   diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> >> index c4b68a0b17..1091593478 100644
> >> --- a/target/arm/kvm.c
> >> +++ b/target/arm/kvm.c
> >> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> >>           abort();
> >>       }
> >>
> >> +     /* finalize the features like SVE, SME etc */
> >> +     arm_cpu_finalize_features(cpu, &error_abort);
> >> +
> >>       /*
> >>        * Initialize the vCPU in the host. This will reset the sys regs
> >>        * for this vCPU and related registers like MPIDR_EL1 etc. also
> >>
> >>
> >>
> >>
> >>>
> >>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
> >>>     --enable-kvm -machine virt,gic-version=3 -cpu host               \
> >>>     -smp cpus=4,disabledcpus=2 -m 1024M                              \
> >>>     -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
> >>>     -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
> >>> qemu-system-aarch64: Failed to initialize host vcpu 4
> >>> Aborted (core dumped)
> >>>
> >>> Backtrace
> >>> =========
> >>> (gdb) bt
> >>> #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
> >>> #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
> >>> #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
> >>> #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
> >>>       at ../target/arm/kvm.c:1081
> >>> #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
> >>>       at ../hw/arm/virt.c:2483
> >>> #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
> >>> #6  0x0000aaaab160f220 in machine_run_board_init
> >>>       (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
> >>> #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
> >>> #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
> >>>       at ../system/vl.c:2821
> >>> #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
> >>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71
> >>
> >>
> >> Thank you for this. Please let me know if the above fix works and also
> >> the return values in
> >> case you encounter errors.
> >
> > I've pushed the fix to below branch for your convenience:
> >
> > Branch: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
> > Fix: https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
> >
>
> I guess rfc-v6.2 branch isn't ready for test because it runs into another crash
> dump with rfc-v6.2 branch, like below.


rfc-6.2 is not crashing on Kunpeng920 where I tested. But this
chip does not have some ARM extensions like SVE etc so
Unfortunately, I can't test SVE/SME/PAuth etc support.

Can you disable SVE and then try if it comes up just to corner
the case?

>
> host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
>        -accel kvm -machine virt,gic-version=host,nvdimm=on                         \
>        -cpu host,sve=on                                                            \
>        -smp maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
>        -m 4096M,slots=16,maxmem=128G                                               \
>        -object memory-backend-ram,id=mem0,size=2048M                               \
>        -object memory-backend-ram,id=mem1,size=2048M                               \
>        -numa node,nodeid=0,memdev=mem0,cpus=0-1                                    \
>        -numa node,nodeid=1,memdev=mem1,cpus=2-3                                    \
>        -L /home/gavin/sandbox/qemu.main/build/pc-bios                              \
>        -monitor none -serial mon:stdio -nographic -gdb tcp::6666                   \
>        -qmp tcp:localhost:5555,server,wait=off                                     \
>        -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd      \
>        -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
>        -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
>        -append memhp_default_state=online_movable
>          :
>          :
> guest$ cd /sys/devices/system/cpu/
> guest$ cat present enabled online
> 0-3
> 0-1
> 0-1
> (qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
> qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation not permitted


Ah, I see. I think I understand the issue. It's complaining
about calling the  finalize twice. Is it possible to check as
I do not have a way to test it?


int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
{
switch (feature) {
case KVM_ARM_VCPU_SVE:
[...]
if (kvm_arm_vcpu_sve_finalized(vcpu))
return -EPERM;-----> this where it must be popping?
[...]
}


>
> I picked the fix (the last patch in rfc-v6.2 branch) to rfc-v6 branch, same crash dump
> can be seen.

Are you getting previously reported abort or above new problem?


Thanks
Salil.
Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Gavin Shan 3 weeks, 1 day ago
Hi Salil,

On 10/23/25 10:35 AM, Salil Mehta wrote:
> On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <gshan@redhat.com> wrote:
>> On 10/23/25 4:50 AM, Salil Mehta wrote:
>>> On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <salil.mehta@opnsrc.net> wrote:
>>>> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
>>>>> On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
>>>>>> From: Salil Mehta <salil.mehta@huawei.com>

[...]

>>>>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
>>>>>> +{
>>>>>> +    CPUState *cs = CPU(cpu);
>>>>>> +    unsigned long vcpu_id = cs->cpu_index;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    ret = kvm_create_vcpu(cs);
>>>>>> +    if (ret < 0) {
>>>>>> +        error_report("Failed to create host vcpu %ld", vcpu_id);
>>>>>> +        abort();
>>>>>> +    }
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Initialize the vCPU in the host. This will reset the sys regs
>>>>>> +     * for this vCPU and related registers like MPIDR_EL1 etc. also
>>>>>> +     * get programmed during this call to host. These are referenced
>>>>>> +     * later while setting device attributes of the GICR during GICv3
>>>>>> +     * reset.
>>>>>> +     */
>>>>>> +    ret = kvm_arch_init_vcpu(cs);
>>>>>> +    if (ret < 0) {
>>>>>> +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
>>>>>> +        abort();
>>>>>> +    }
>>>>>> +
>>>>>> +    /*
>>>>>> +     * park the created vCPU. shall be used during kvm_get_vcpu() when
>>>>>> +     * threads are created during realization of ARM vCPUs.
>>>>>> +     */
>>>>>> +    kvm_park_vcpu(cs);
>>>>>> +}
>>>>>> +
>>>>>
>>>>> I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
>>>>> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
>>>>> SVE is supported by default.
>>>>
>>>> Thanks for reporting this. That is not true. As long as we initialize
>>>> KVM correctly and
>>>> finalize the features like SVE we should be fine. In fact, this is
>>>> precisely what we are
>>>> doing right now.
>>>>
>>>> To understand the crash, I need a bit more info.
>>>>
>>>> 1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you check
>>>>         within the KVM if it is happening because
>>>>        a.  features specified by QEMU are not matching the defaults within the KVM
>>>>              (HInt: check kvm_vcpu_init_check_features())?
>>>>        b. or complaining about init feate change kvm_vcpu_init_changed()?
>>>> 2#  or it is happening during the setting of vector length or
>>>> finalizing features?
>>>>
>>>> int kvm_arch_init_vcpu(CPUState *cs)
>>>> {
>>>>      [...]
>>>>            /* Do KVM_ARM_VCPU_INIT ioctl */
>>>>           ret = kvm_arm_vcpu_init(cpu);   ---->[1]
>>>>           if (ret) {
>>>>              return ret;
>>>>          }
>>>>             if (cpu_isar_feature(aa64_sve, cpu)) {
>>>>           ret = kvm_arm_sve_set_vls(cpu); ---->[2]
>>>>           if (ret) {
>>>>               return ret;
>>>>           }
>>>>           ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
>>>>           if (ret) {
>>>>               return ret;
>>>>           }
>>>>       }
>>>> [...]
>>>> }
>>>>
>>>> I think it's happening because vector length is going uninitialized.
>>>> This initialization
>>>> happens in context to  arm_cpu_finalize_features() which I forgot to call before
>>>> calling KVM finalize.
>>>>
>>>>>
>>>>> kvm_arch_init_vcpu() is supposed to be called in the realization path in current
>>>>> implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
>>>>> is populated at vCPU realization time.
>>>>
>>>> Not necessarily. It is just meant to initialize the KVM. If we take care of the
>>>> KVM requirements in the similar way the realize path does we should be
>>>> fine. Can you try to add the patch below in your code and test if it works?
>>>>
>>>>    diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>>>> index c4b68a0b17..1091593478 100644
>>>> --- a/target/arm/kvm.c
>>>> +++ b/target/arm/kvm.c
>>>> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
>>>>            abort();
>>>>        }
>>>>
>>>> +     /* finalize the features like SVE, SME etc */
>>>> +     arm_cpu_finalize_features(cpu, &error_abort);
>>>> +
>>>>        /*
>>>>         * Initialize the vCPU in the host. This will reset the sys regs
>>>>         * for this vCPU and related registers like MPIDR_EL1 etc. also
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
>>>>>      --enable-kvm -machine virt,gic-version=3 -cpu host               \
>>>>>      -smp cpus=4,disabledcpus=2 -m 1024M                              \
>>>>>      -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
>>>>>      -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
>>>>> qemu-system-aarch64: Failed to initialize host vcpu 4
>>>>> Aborted (core dumped)
>>>>>
>>>>> Backtrace
>>>>> =========
>>>>> (gdb) bt
>>>>> #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
>>>>> #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
>>>>> #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
>>>>> #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
>>>>>        at ../target/arm/kvm.c:1081
>>>>> #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
>>>>>        at ../hw/arm/virt.c:2483
>>>>> #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
>>>>> #6  0x0000aaaab160f220 in machine_run_board_init
>>>>>        (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
>>>>> #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
>>>>> #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
>>>>>        at ../system/vl.c:2821
>>>>> #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
>>>>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71
>>>>
>>>>
>>>> Thank you for this. Please let me know if the above fix works and also
>>>> the return values in
>>>> case you encounter errors.
>>>
>>> I've pushed the fix to below branch for your convenience:
>>>
>>> Branch: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
>>> Fix: https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
>>>
>>
>> I guess rfc-v6.2 branch isn't ready for test because it runs into another crash
>> dump with rfc-v6.2 branch, like below.
> 
> 
> rfc-6.2 is not crashing on Kunpeng920 where I tested. But this
> chip does not have some ARM extensions like SVE etc so
> Unfortunately, I can't test SVE/SME/PAuth etc support.
> 
> Can you disable SVE and then try if it comes up just to corner
> the case?
> 

Right, this crash dump shouldn't be encountered if SVE isn't supported. I already
had the workaround "-cpu host,sve=off" to keep my tests moving forwards...

>>
>> host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
>>         -accel kvm -machine virt,gic-version=host,nvdimm=on                         \
>>         -cpu host,sve=on                                                            \
>>         -smp maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
>>         -m 4096M,slots=16,maxmem=128G                                               \
>>         -object memory-backend-ram,id=mem0,size=2048M                               \
>>         -object memory-backend-ram,id=mem1,size=2048M                               \
>>         -numa node,nodeid=0,memdev=mem0,cpus=0-1                                    \
>>         -numa node,nodeid=1,memdev=mem1,cpus=2-3                                    \
>>         -L /home/gavin/sandbox/qemu.main/build/pc-bios                              \
>>         -monitor none -serial mon:stdio -nographic -gdb tcp::6666                   \
>>         -qmp tcp:localhost:5555,server,wait=off                                     \
>>         -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd      \
>>         -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
>>         -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
>>         -append memhp_default_state=online_movable
>>           :
>>           :
>> guest$ cd /sys/devices/system/cpu/
>> guest$ cat present enabled online
>> 0-3
>> 0-1
>> 0-1
>> (qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
>> qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation not permitted
> 
> 
> Ah, I see. I think I understand the issue. It's complaining
> about calling the  finalize twice. Is it possible to check as
> I do not have a way to test it?
> 
> 
> int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
> {
> switch (feature) {
> case KVM_ARM_VCPU_SVE:
> [...]
> if (kvm_arm_vcpu_sve_finalized(vcpu))
> return -EPERM;-----> this where it must be popping?
> [...]
> }
> 

Right, I think that's the case: QEMU tries to finalize SVE capability for twice,
which is the real problem. I'm explaining what I found as below, which would be
helpful to the forthcoming revisions.

machvirt_init
   virt_setup_lazy_vcpu_realization
     arm_cpu_finalize_features
     kvm_arm_create_host_vcpu
       kvm_create_vcpu                       // New fd is created
       kvm_arch_init_vcpu
         kvm_arm_vcpu_init
         kvm_arm_sve_set_vls
         kvm_arm_vcpu_finalize               // (A) SVE capability is finalized

device_set_admin_power_state
   device_pre_poweron
     virt_machine_device_pre_poweron
       virt_cpu_pre_poweron
         qdev_realize
           arm_cpu_realizefn
             cpu_exec_realizefn
             arm_cpu_finalize_features       // Called for the second time
             qemu_init_vcpu
               kvm_start_vcpu_thread
                 kvm_vcpu_thread_fn
                   kvm_init_vcpu
                     kvm_create_vcpu         // Called for the second time
                     kvm_arch_init_vcpu      // Called for the second time
                       kvm_arm_vcpu_init
                       kvm_arm_sve_set_vls   // (B) Failed here
                       kvm_arm_vcpu_finalize

(B) where we try to finalize SVE capability again. It has been finalized at (A)
     Fianlizing SVE capability for twice is disallowed by KVM on the host side.


>>
>> I picked the fix (the last patch in rfc-v6.2 branch) to rfc-v6 branch, same crash dump
>> can be seen.
> 
> Are you getting previously reported abort or above new problem?
> 

Previously, the VM can't be started. After your fix is applied, the VM is able to start.
It's a new problem that qemu crash dump is seens on attempt to hot add a vCPU.

Thanks,
Gavin


Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Salil Mehta 3 weeks, 1 day ago
Hi Gavin

On Thu, Oct 23, 2025 at 1:58 AM Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Salil,
>
> On 10/23/25 10:35 AM, Salil Mehta wrote:
> > On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <gshan@redhat.com> wrote:
> >> On 10/23/25 4:50 AM, Salil Mehta wrote:
> >>> On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <salil.mehta@opnsrc.net> wrote:
> >>>> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
> >>>>> On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
> >>>>>> From: Salil Mehta <salil.mehta@huawei.com>

[...]

> >> guest$ cd /sys/devices/system/cpu/
> >> guest$ cat present enabled online
> >> 0-3
> >> 0-1
> >> 0-1
> >> (qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
> >> qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation not permitted
> >
> >
> > Ah, I see. I think I understand the issue. It's complaining
> > about calling the  finalize twice. Is it possible to check as
> > I do not have a way to test it?
> >
> >
> > int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
> > {
> > switch (feature) {
> > case KVM_ARM_VCPU_SVE:
> > [...]
> > if (kvm_arm_vcpu_sve_finalized(vcpu))
> > return -EPERM;-----> this where it must be popping?
> > [...]
> > }
> >
>
> Right, I think that's the case: QEMU tries to finalize SVE capability for twice,
> which is the real problem. I'm explaining what I found as below, which would be
> helpful to the forthcoming revisions.
>
> machvirt_init
>    virt_setup_lazy_vcpu_realization
>      arm_cpu_finalize_features
>      kvm_arm_create_host_vcpu
>        kvm_create_vcpu                       // New fd is created
>        kvm_arch_init_vcpu
>          kvm_arm_vcpu_init
>          kvm_arm_sve_set_vls
>          kvm_arm_vcpu_finalize               // (A) SVE capability is finalized
>
> device_set_admin_power_state
>    device_pre_poweron
>      virt_machine_device_pre_poweron
>        virt_cpu_pre_poweron
>          qdev_realize
>            arm_cpu_realizefn
>              cpu_exec_realizefn
>              arm_cpu_finalize_features       // Called for the second time
>              qemu_init_vcpu
>                kvm_start_vcpu_thread
>                  kvm_vcpu_thread_fn
>                    kvm_init_vcpu
>                      kvm_create_vcpu         // Called for the second time
>                      kvm_arch_init_vcpu      // Called for the second time
>                        kvm_arm_vcpu_init
>                        kvm_arm_sve_set_vls   // (B) Failed here
>                        kvm_arm_vcpu_finalize
>
> (B) where we try to finalize SVE capability again. It has been finalized at (A)
>      Fianlizing SVE capability for twice is disallowed by KVM on the host side.
>
>
> >>
> >> I picked the fix (the last patch in rfc-v6.2 branch) to rfc-v6 branch, same crash dump
> >> can be seen.
> >
> > Are you getting previously reported abort or above new problem?
> >
>
> Previously, the VM can't be started. After your fix is applied, the VM is able to start.
> It's a new problem that qemu crash dump is seens on attempt to hot add a vCPU.


Thanks for confirming this as well.

Cheers
Salil.
Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Salil Mehta 3 weeks, 1 day ago
Hi Gavin,

On Thu, Oct 23, 2025 at 12:35 AM Salil Mehta <salil.mehta@opnsrc.net> wrote:
>
> HI Gavin,
>
> On Thu, Oct 23, 2025 at 12:14 AM Gavin Shan <gshan@redhat.com> wrote:
> >
> > Hi Salil,
> >
> > On 10/23/25 4:50 AM, Salil Mehta wrote:
> > > On Wed, Oct 22, 2025 at 6:18 PM Salil Mehta <salil.mehta@opnsrc.net> wrote:
> > >> On Wed, Oct 22, 2025 at 10:37 AM Gavin Shan <gshan@redhat.com> wrote:
> > >>>
> > >>> Hi Salil,
> > >>>
> > >>> On 10/1/25 11:01 AM, salil.mehta@opnsrc.net wrote:
> > >>>> From: Salil Mehta <salil.mehta@huawei.com>
> > >>>>
> > >>>> ARM CPU architecture does not allow CPUs to be plugged after system has
> > >>>> initialized. This is a constraint. Hence, the Kernel must know all the CPUs
> > >>>> being booted during its initialization. This applies to the Guest Kernel as
> > >>>> well and therefore, the number of KVM vCPU descriptors in the host must be
> > >>>> fixed at VM initialization time.
> > >>>>
> > >>>> Also, the GIC must know all the CPUs it is connected to during its
> > >>>> initialization, and this cannot change afterward. This must also be ensured
> > >>>> during the initialization of the VGIC in KVM. This is necessary because:
> > >>>>
> > >>>> 1. The association between GICR and MPIDR must be fixed at VM initialization
> > >>>>      time. This is represented by the register
> > >>>>      `GICR_TYPER(mp_affinity, proc_num)`.
> > >>>> 2. Memory regions associated with GICR, etc., cannot be changed (added,
> > >>>>      deleted, or modified) after the VM has been initialized. This is not an
> > >>>>      ARM architectural constraint but rather invites a difficult and messy
> > >>>>      change in VGIC data structures.
> > >>>>
> > >>>> To enable a hot-add–like model while preserving these constraints, the virt
> > >>>> machine may enumerate more CPUs than are enabled at boot using
> > >>>> `-smp disabledcpus=N`. Such CPUs are present but start offline (i.e.,
> > >>>> administratively disabled at init). The topology remains fixed at VM
> > >>>> creation time; only the online/offline status may change later.
> > >>>>
> > >>>> Administratively disabled vCPUs are not realized in QOM until first enabled,
> > >>>> avoiding creation of unnecessary vCPU threads at boot. On large systems, this
> > >>>> reduces startup time proportionally to the number of disabled vCPUs. Once a
> > >>>> QOM vCPU is realized and its thread created, subsequent enable/disable actions
> > >>>> do not unrealize it. This behaviour was adopted following review feedback and
> > >>>> differs from earlier RFC versions.
> > >>>>
> > >>>> Co-developed-by: Keqian Zhu <zhuqian1@huawei.com>
> > >>>> Signed-off-by: Keqian Zhu <zhuqian1@huawei.com>
> > >>>> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > >>>> ---
> > >>>>    accel/kvm/kvm-all.c    |  2 +-
> > >>>>    hw/arm/virt.c          | 77 ++++++++++++++++++++++++++++++++++++++----
> > >>>>    hw/core/qdev.c         | 17 ++++++++++
> > >>>>    include/hw/qdev-core.h | 19 +++++++++++
> > >>>>    include/system/kvm.h   |  8 +++++
> > >>>>    target/arm/cpu.c       |  2 ++
> > >>>>    target/arm/kvm.c       | 40 +++++++++++++++++++++-
> > >>>>    target/arm/kvm_arm.h   | 11 ++++++
> > >>>>    8 files changed, 168 insertions(+), 8 deletions(-)
> > >>>>
> > >
> > > [...]
> > >
> > >>>> +void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> > >>>> +{
> > >>>> +    CPUState *cs = CPU(cpu);
> > >>>> +    unsigned long vcpu_id = cs->cpu_index;
> > >>>> +    int ret;
> > >>>> +
> > >>>> +    ret = kvm_create_vcpu(cs);
> > >>>> +    if (ret < 0) {
> > >>>> +        error_report("Failed to create host vcpu %ld", vcpu_id);
> > >>>> +        abort();
> > >>>> +    }
> > >>>> +
> > >>>> +    /*
> > >>>> +     * Initialize the vCPU in the host. This will reset the sys regs
> > >>>> +     * for this vCPU and related registers like MPIDR_EL1 etc. also
> > >>>> +     * get programmed during this call to host. These are referenced
> > >>>> +     * later while setting device attributes of the GICR during GICv3
> > >>>> +     * reset.
> > >>>> +     */
> > >>>> +    ret = kvm_arch_init_vcpu(cs);
> > >>>> +    if (ret < 0) {
> > >>>> +        error_report("Failed to initialize host vcpu %ld", vcpu_id);
> > >>>> +        abort();
> > >>>> +    }
> > >>>> +
> > >>>> +    /*
> > >>>> +     * park the created vCPU. shall be used during kvm_get_vcpu() when
> > >>>> +     * threads are created during realization of ARM vCPUs.
> > >>>> +     */
> > >>>> +    kvm_park_vcpu(cs);
> > >>>> +}
> > >>>> +
> > >>>
> > >>> I don't think we're able to simply call kvm_arch_init_vcpu() in the lazily realized
> > >>> path. Otherwise, it can trigger a crash dump on my Nvidia's grace-hopper machine where
> > >>> SVE is supported by default.
> > >>
> > >> Thanks for reporting this. That is not true. As long as we initialize
> > >> KVM correctly and
> > >> finalize the features like SVE we should be fine. In fact, this is
> > >> precisely what we are
> > >> doing right now.
> > >>
> > >> To understand the crash, I need a bit more info.
> > >>
> > >> 1#  is happening because KVM_ARM_VCPU_INIT is failing. If yes, the can you check
> > >>        within the KVM if it is happening because
> > >>       a.  features specified by QEMU are not matching the defaults within the KVM
> > >>             (HInt: check kvm_vcpu_init_check_features())?
> > >>       b. or complaining about init feate change kvm_vcpu_init_changed()?
> > >> 2#  or it is happening during the setting of vector length or
> > >> finalizing features?
> > >>
> > >> int kvm_arch_init_vcpu(CPUState *cs)
> > >> {
> > >>     [...]
> > >>           /* Do KVM_ARM_VCPU_INIT ioctl */
> > >>          ret = kvm_arm_vcpu_init(cpu);   ---->[1]
> > >>          if (ret) {
> > >>             return ret;
> > >>         }
> > >>            if (cpu_isar_feature(aa64_sve, cpu)) {
> > >>          ret = kvm_arm_sve_set_vls(cpu); ---->[2]
> > >>          if (ret) {
> > >>              return ret;
> > >>          }
> > >>          ret = kvm_arm_vcpu_finalize(cpu, KVM_ARM_VCPU_SVE);--->[3]
> > >>          if (ret) {
> > >>              return ret;
> > >>          }
> > >>      }
> > >> [...]
> > >> }
> > >>
> > >> I think it's happening because vector length is going uninitialized.
> > >> This initialization
> > >> happens in context to  arm_cpu_finalize_features() which I forgot to call before
> > >> calling KVM finalize.
> > >>
> > >>>
> > >>> kvm_arch_init_vcpu() is supposed to be called in the realization path in current
> > >>> implementation (without this series) because the parameters (features) to KVM_ARM_VCPU_INIT
> > >>> is populated at vCPU realization time.
> > >>
> > >> Not necessarily. It is just meant to initialize the KVM. If we take care of the
> > >> KVM requirements in the similar way the realize path does we should be
> > >> fine. Can you try to add the patch below in your code and test if it works?
> > >>
> > >>   diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > >> index c4b68a0b17..1091593478 100644
> > >> --- a/target/arm/kvm.c
> > >> +++ b/target/arm/kvm.c
> > >> @@ -1068,6 +1068,9 @@ void kvm_arm_create_host_vcpu(ARMCPU *cpu)
> > >>           abort();
> > >>       }
> > >>
> > >> +     /* finalize the features like SVE, SME etc */
> > >> +     arm_cpu_finalize_features(cpu, &error_abort);
> > >> +
> > >>       /*
> > >>        * Initialize the vCPU in the host. This will reset the sys regs
> > >>        * for this vCPU and related registers like MPIDR_EL1 etc. also
> > >>
> > >>
> > >>
> > >>
> > >>>
> > >>> $ home/gavin/sandbox/qemu.main/build/qemu-system-aarch64           \
> > >>>     --enable-kvm -machine virt,gic-version=3 -cpu host               \
> > >>>     -smp cpus=4,disabledcpus=2 -m 1024M                              \
> > >>>     -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image    \
> > >>>     -initrd /home/gavin/sandbox/images/rootfs.cpio.xz -nographic
> > >>> qemu-system-aarch64: Failed to initialize host vcpu 4
> > >>> Aborted (core dumped)
> > >>>
> > >>> Backtrace
> > >>> =========
> > >>> (gdb) bt
> > >>> #0  0x0000ffff9106bc80 in __pthread_kill_implementation () at /lib64/libc.so.6
> > >>> #1  0x0000ffff9101aa40 [PAC] in raise () at /lib64/libc.so.6
> > >>> #2  0x0000ffff91005988 [PAC] in abort () at /lib64/libc.so.6
> > >>> #3  0x0000aaaab1cc26b8 [PAC] in kvm_arm_create_host_vcpu (cpu=0xaaaab9ab1bc0)
> > >>>       at ../target/arm/kvm.c:1081
> > >>> #4  0x0000aaaab1cd0c94 in virt_setup_lazy_vcpu_realization (cpuobj=0xaaaab9ab1bc0, vms=0xaaaab98870a0)
> > >>>       at ../hw/arm/virt.c:2483
> > >>> #5  0x0000aaaab1cd180c in machvirt_init (machine=0xaaaab98870a0) at ../hw/arm/virt.c:2777
> > >>> #6  0x0000aaaab160f220 in machine_run_board_init
> > >>>       (machine=0xaaaab98870a0, mem_path=0x0, errp=0xfffffa86bdc8) at ../hw/core/machine.c:1722
> > >>> #7  0x0000aaaab1a25ef4 in qemu_init_board () at ../system/vl.c:2723
> > >>> #8  0x0000aaaab1a2635c in qmp_x_exit_preconfig (errp=0xaaaab38a50f0 <error_fatal>)
> > >>>       at ../system/vl.c:2821
> > >>> #9  0x0000aaaab1a28b08 in qemu_init (argc=15, argv=0xfffffa86c1f8) at ../system/vl.c:3882
> > >>> #10 0x0000aaaab221d9e4 in main (argc=15, argv=0xfffffa86c1f8) at ../system/main.c:71
> > >>
> > >>
> > >> Thank you for this. Please let me know if the above fix works and also
> > >> the return values in
> > >> case you encounter errors.
> > >
> > > I've pushed the fix to below branch for your convenience:
> > >
> > > Branch: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
> > > Fix: https://github.com/salil-mehta/qemu/commit/1f1fbc0998ffb1fe26140df3c336bf2be2aa8669
> > >
> >
> > I guess rfc-v6.2 branch isn't ready for test because it runs into another crash
> > dump with rfc-v6.2 branch, like below.
>
>
> rfc-6.2 is not crashing on Kunpeng920 where I tested. But this
> chip does not have some ARM extensions like SVE etc so
> Unfortunately, I can't test SVE/SME/PAuth etc support.
>
> Can you disable SVE and then try if it comes up just to corner
> the case?
>
> >
> > host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
> >        -accel kvm -machine virt,gic-version=host,nvdimm=on                         \
> >        -cpu host,sve=on                                                            \
> >        -smp maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
> >        -m 4096M,slots=16,maxmem=128G                                               \
> >        -object memory-backend-ram,id=mem0,size=2048M                               \
> >        -object memory-backend-ram,id=mem1,size=2048M                               \
> >        -numa node,nodeid=0,memdev=mem0,cpus=0-1                                    \
> >        -numa node,nodeid=1,memdev=mem1,cpus=2-3                                    \
> >        -L /home/gavin/sandbox/qemu.main/build/pc-bios                              \
> >        -monitor none -serial mon:stdio -nographic -gdb tcp::6666                   \
> >        -qmp tcp:localhost:5555,server,wait=off                                     \
> >        -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd      \
> >        -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
> >        -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
> >        -append memhp_default_state=online_movable
> >          :
> >          :
> > guest$ cd /sys/devices/system/cpu/
> > guest$ cat present enabled online
> > 0-3
> > 0-1
> > 0-1
> > (qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
> > qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (2): Operation not permitted
>
>
> Ah, I see. I think I understand the issue. It's complaining
> about calling the  finalize twice. Is it possible to check as
> I do not have a way to test it?
>
>
> int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
> {
> switch (feature) {
> case KVM_ARM_VCPU_SVE:
> [...]
> if (kvm_arm_vcpu_sve_finalized(vcpu))
> return -EPERM;-----> this where it must be popping?
> [...]
> }

I've pushed the fix to avoid calling the finalizing SVE
feature (KVM_ARM_VCPU_FINALIZE) twice on the
same RFC-V6.2 branch.

May I kindly request you to validate the fix again and
check SVE works on NVIDIA grace-hopper?

Many thanks!

Best regards
Salil.
Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Gavin Shan 3 weeks, 1 day ago
Hi Salil,

On 10/23/25 11:29 AM, Salil Mehta wrote:

[...]

>>
>> Ah, I see. I think I understand the issue. It's complaining
>> about calling the  finalize twice. Is it possible to check as
>> I do not have a way to test it?
>>
>>
>> int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
>> {
>> switch (feature) {
>> case KVM_ARM_VCPU_SVE:
>> [...]
>> if (kvm_arm_vcpu_sve_finalized(vcpu))
>> return -EPERM;-----> this where it must be popping?
>> [...]
>> }
> 
> I've pushed the fix to avoid calling the finalizing SVE
> feature (KVM_ARM_VCPU_FINALIZE) twice on the
> same RFC-V6.2 branch.
> 
> May I kindly request you to validate the fix again and
> check SVE works on NVIDIA grace-hopper?
> 

With the latest rfc-v6.2 branch, I don't hit the issue. The vCPU can be hot added
and removed on grace-hopper host.

host$ /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
       -accel kvm -machine virt,gic-version=host,nvdimm=on                         \
       -cpu host,sve=on                                                            \
       -smp maxcpus=4,cpus=2,disabledcpus=2,sockets=2,clusters=2,cores=1,threads=1 \
       -m 4096M,slots=16,maxmem=128G                                               \
       -object memory-backend-ram,id=mem0,size=2048M                               \
       -object memory-backend-ram,id=mem1,size=2048M                               \
       -numa node,nodeid=0,memdev=mem0,cpus=0-1                                    \
       -numa node,nodeid=1,memdev=mem1,cpus=2-3                                    \
       -L /home/gavin/sandbox/qemu.main/build/pc-bios                              \
       -monitor none -serial mon:stdio -nographic -gdb tcp::6666                   \
       -qmp tcp:localhost:5555,server,wait=off                                     \
       -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd      \
       -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
       -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
       -append memhp_default_state=online_movable
          :
          :
guest$ cd /sys/devices/system/cpu
guest$ cat present enabled online
0-3
0-1
0-1
(qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=enable
guest$ echo 1 > cpu2/online
guest$ cat present enabled online
0-3
0-2
0-2
         :
         :
guest$ cd /sys/device/system/cpu
guest$ cat present enabled online
0-3
0-2
0-2
(qemu) device_set host-arm-cpu,socket-id=1,cluster-id=0,core-id=0,thread-id=0,admin-state=disable
guest$ cat present enabled online
0-3
0-1
0-1

Thanks,
Gavin
Re: [PATCH RFC V6 05/24] arm/virt,kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init
Posted by Salil Mehta 3 weeks, 1 day ago
[!] Sending this again, to keep conversation *legally* correct,
as this did not appear in the mailing-list when sent from my
official ID.

Sorry for any inconvenience caused due to this.

On Thu, Oct 23, 2025 at 4:14 AM Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Salil,
>
> On 10/23/25 11:29 AM, Salil Mehta wrote:
>
> [...]
>
> >>
> >> Ah, I see. I think I understand the issue. It's complaining
> >> about calling the  finalize twice. Is it possible to check as
> >> I do not have a way to test it?
> >>
> >>
> >> int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
> >> {
> >> switch (feature) {
> >> case KVM_ARM_VCPU_SVE:
> >> [...]
> >> if (kvm_arm_vcpu_sve_finalized(vcpu))
> >> return -EPERM;-----> this where it must be popping?
> >> [...]
> >> }
> >
> > I've pushed the fix to avoid calling the finalizing SVE
> > feature (KVM_ARM_VCPU_FINALIZE) twice on the
> > same RFC-V6.2 branch.
> >
> > May I kindly request you to validate the fix again and
> > check SVE works on NVIDIA grace-hopper?
> >
>
> With the latest rfc-v6.2 branch, I don't hit the issue. The vCPU can be hot added
> and removed on grace-hopper host.

Excellent, SVE/SME and other ARM extensions have not been tested earlier.
It would be of immense help if all of these can be validated as I do not have
capable hardware to test them.

Many thanks for your proactive efforts in reporting, reviewing the fixes and
validating them as well. I appreciate it!

For anyone who wants to try, fix is here:
 https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6.2
https://github.com/salil-mehta/qemu/commit/cd58e65a79c224a59407553c1a6288ed667b19ed


Many thanks!
Salil.